攻撃手法

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

Authors: Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-François Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber | Published: 2023-10-24 | Updated: 2024-03-03
テキスト生成手法
プロンプトインジェクション
攻撃手法

Deceptive Fairness Attacks on Graphs via Meta Learning

Authors: Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong | Published: 2023-10-24
GNN
攻撃手法
評価指標

AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models

Authors: Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun | Published: 2023-10-23 | Updated: 2023-12-14
プロンプトインジェクション
安全性アライメント
攻撃手法

A Comprehensive Study of Privacy Risks in Curriculum Learning

Authors: Joann Qiongna Chen, Xinlei He, Zheng Li, Yang Zhang, Zhou Li | Published: 2023-10-16
メンバーシップ推論
モデル性能評価
攻撃手法

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Authors: Chengkun Wei, Wenlong Meng, Zhikun Zhang, Min Chen, Minghu Zhao, Wenjing Fang, Lei Wang, Zihui Zhang, Wenzhi Chen | Published: 2023-08-26 | Updated: 2023-10-14
トリガーの検知
バックドアモデルの検知
攻撃手法

Why Don’t You Clean Your Glasses? Perception Attacks with Dynamic Optical Perturbations

Authors: Yi Han, Matthew Chan, Eric Wengrowski, Zhuohuan Li, Nils Ole Tippenhauer, Mani Srivastava, Saman Zonouz, Luis Garcia | Published: 2023-07-24 | Updated: 2023-07-27
攻撃手法
敵対的サンプル
物理攻撃

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

Authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan | Published: 2023-07-20
セキュリティ分析
プログラムの検証
攻撃手法

Few-shot Multi-domain Knowledge Rearming for Context-aware Defence against Advanced Persistent Threats

Authors: Gaolei Li, Yuanyuan Zhao, Wenqi Wei, Yuchen Liu | Published: 2023-06-13 | Updated: 2023-06-14
APT防御手法
攻撃手法
限られたサンプルでのマルウェア検出

Zero-Day Threats Detection for Critical Infrastructures

Authors: Mike Nkongolo, Mahmut Tokmak | Published: 2023-06-10
ランダムフォレスト
攻撃手法
統計的手法

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions

Authors: Ezgi Korkmaz, Jonah Brown-Cohen | Published: 2023-06-09
攻撃手法
敵対的訓練
行動解析手法