敵対的学習

BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models

Authors: Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Kangjie Chen, Tianwei Zhang, Qingchuan Zhao, Guowen Xu | Published: 2025-05-06
RAGへのポイズニング攻撃
バックドア攻撃対策
敵対的学習

Bayesian Robust Aggregation for Federated Learning

Authors: Aleksandr Karakulev, Usama Zafar, Salman Toor, Prashant Singh | Published: 2025-05-05
グループベースの堅牢性
トリガーの検知
敵対的学習

How to Backdoor the Knowledge Distillation

Authors: Chen Wu, Qian Ma, Prasenjit Mitra, Sencun Zhu | Published: 2025-04-30
バックドア攻撃
敵対的学習
知識蒸留の脆弱性

GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security

Authors: Xiangkun Wang, Kejiang Chen, Yuang Qi, Ruiheng Liu, Weiming Zhang, Nenghai Yu | Published: 2025-04-21
敵対的学習
生成モデル
透かし技術

Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent

Authors: Philip Doldo, Derek Everett, Amol Khanna, Andre T Nguyen, Edward Raff | Published: 2025-03-25
敵対的サンプルの脆弱性
敵対的学習
深層ネットワークの堅牢性

TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions

Authors: Wang YuHang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu | Published: 2025-03-02 | Updated: 2025-03-21
ロバスト性
敵対的学習
敵対的訓練

SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage

Authors: Xiaoning Dong, Wenbo Hu, Wei Xu, Tianxing He | Published: 2024-12-19 | Updated: 2025-03-21
プロンプトインジェクション
大規模言語モデル
敵対的学習

Protecting Confidentiality, Privacy and Integrity in Collaborative Learning

Authors: Dong Chen, Alice Dethise, Istemi Ekin Akkus, Ivica Rimac, Klaus Satzke, Antti Koskela, Marco Canini, Wei Wang, Ruichuan Chen | Published: 2024-12-11 | Updated: 2025-04-17
プライバシー保護フレームワーク
差分プライバシー
敵対的学習

Robust LLM safeguarding via refusal feature adversarial training

Authors: Lei Yu, Virginie Do, Karen Hambardzumyan, Nicola Cancedda | Published: 2024-09-30 | Updated: 2025-03-20
プロンプトインジェクション
モデルの堅牢性
敵対的学習

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Authors: Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Yu-Yang Liu, Li Yuan | Published: 2023-10-02 | Updated: 2024-08-04
ハルシネーション
敵対的サンプルの脆弱性
敵対的学習