バックドア攻撃

Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning

Authors: Shuai Zhao, Meihuizi Jia, Luu Anh Tuan, Fengjun Pan, Jinming Wen | Published: 2024-01-11 | Updated: 2024-10-09
バックドア攻撃
プロンプトインジェクション

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Authors: Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky, Paul Christiano, Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, Ethan Perez | Published: 2024-01-10 | Updated: 2024-01-17
バックドア攻撃
プロンプトインジェクション
強化学習

MalModel: Hiding Malicious Payload in Mobile Deep Learning Models with Black-box Backdoor Attack

Authors: Jiayi Hua, Kailong Wang, Meizhen Wang, Guangdong Bai, Xiapu Luo, Haoyu Wang | Published: 2024-01-05
バックドア攻撃
マルウェア分類
モデル性能評価

FedBayes: A Zero-Trust Federated Learning Aggregation to Defend Against Adversarial Attacks

Authors: Marc Vucovich, Devin Quinn, Kevin Choi, Christopher Redino, Abdul Rahman, Edward Bowen | Published: 2023-12-04
バックドア攻撃
悪意のあるクライアント
連合学習

Understanding Variation in Subpopulation Susceptibility to Poisoning Attacks

Authors: Evan Rose, Fnu Suya, David Evans | Published: 2023-11-20
サブポピュレーション特性
バックドア攻撃
ポイズニング攻撃

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Authors: Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song | Published: 2023-11-19 | Updated: 2023-11-25
テキスト生成手法
バックドア攻撃
ポイズニング

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

Authors: Yuanpu Cao, Bochuan Cao, Jinghui Chen | Published: 2023-11-15 | Updated: 2024-06-09
バックドア攻撃
プロンプトインジェクション

Label Poisoning is All You Need

Authors: Rishi D. Jha, Jonathan Hayase, Sewoong Oh | Published: 2023-10-29
セキュリティ分析
バックドア攻撃
悪意のある行為者の分類

On the Detection of Image-Scaling Attacks in Machine Learning

Authors: Erwin Quiring, Andreas Müller, Konrad Rieck | Published: 2023-10-23
バックドア攻撃
敵対的攻撃検出
検出手法の分析

FLEDGE: Ledger-based Federated Learning Resilient to Inference and Backdoor Attacks

Authors: Jorge Castillo, Phillip Rieger, Hossein Fereidooni, Qian Chen, Ahmad Sadeghi | Published: 2023-10-03
バックドア攻撃
プライバシー保護
ポイズニング