バックドア攻撃

Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models

Authors: Alkis Kalavasis, Amin Karbasi, Argyris Oikonomou, Katerina Sotiraki, Grigoris Velegkas, Manolis Zampetakis | Published: 2024-06-09 | Updated: 2024-09-07
ウォーターマーキング
バックドア攻撃

BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents

Authors: Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian | Published: 2024-06-05
LLMセキュリティ
バックドア攻撃
プロンプトインジェクション

No Vandalism: Privacy-Preserving and Byzantine-Robust Federated Learning

Authors: Zhibo Xing, Zijian Zhang, Zi'ang Zhang, Jiamou Liu, Liehuang Zhu, Giovanni Russello | Published: 2024-06-03
ウォーターマーキング
バックドア攻撃
ポイズニング

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Authors: Sunay Bhat, Jeffrey Jiang, Omead Pooladzandi, Alexander Branch, Gregory Pottie | Published: 2024-05-28 | Updated: 2024-06-02
ウォーターマーキング
バックドア攻撃
ポイズニング

Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems

Authors: Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, Qi Zhu | Published: 2024-05-27 | Updated: 2024-10-05
LLMセキュリティ
バックドア攻撃
プロンプトインジェクション

BadGD: A unified data-centric framework to identify gradient descent vulnerabilities

Authors: Chi-Hua Wang, Guang Cheng | Published: 2024-05-24
バックドア攻撃
ポイズニング

A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure

Authors: Wei Sun, Bo Gao, Ke Xiong, Yuwei Wang | Published: 2024-05-19 | Updated: 2024-05-21
バックドア攻撃
ポイズニング
防御手法

Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning

Authors: Yujie Zhang, Neil Gong, Michael K. Reiter | Published: 2024-05-10 | Updated: 2024-09-09
バックドア攻撃
ポイズニング

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

Authors: Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak | Published: 2024-05-07
バックドア攻撃
モデル性能評価

TuBA: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning

Authors: Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn | Published: 2024-04-30 | Updated: 2025-03-17
コンテンツモデレーション
バックドア攻撃
プロンプトインジェクション