敵対的学習

Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization

Authors: Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Meng Sun | Published: 2025-05-22
LLMセキュリティ
アライメント
敵対的学習

SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models

Authors: Hossein Khalili, Seongbin Park, Venkat Bollapragada, Nader Sehatbakhsh | Published: 2025-05-22
敵対的学習
計算複雑性
防御メカニズム

Adversarially Pretrained Transformers may be Universally Robust In-Context Learners

Authors: Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki | Published: 2025-05-20
モデルの頑健性保証
ロバスト性とプライバシーの関係
敵対的学習

FlowPure: Continuous Normalizing Flows for Adversarial Purification

Authors: Elias Collaert, Abel Rodríguez, Sander Joos, Lieven Desmet, Vera Rimmer | Published: 2025-05-19
堅牢性向上手法
敵対的学習
防御手法の効果分析

Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems

Authors: Mostafa Jafari, Alireza Shameli-Sendi | Published: 2025-05-14
ロバスト性分析
攻撃検出手法
敵対的学習

BadLingual: A Novel Lingual-Backdoor Attack against Large Language Models

Authors: Zihan Wang, Hongwei Li, Rui Zhang, Wenbo Jiang, Kangjie Chen, Tianwei Zhang, Qingchuan Zhao, Guowen Xu | Published: 2025-05-06
RAGへのポイズニング攻撃
バックドア攻撃対策
敵対的学習

Bayesian Robust Aggregation for Federated Learning

Authors: Aleksandr Karakulev, Usama Zafar, Salman Toor, Prashant Singh | Published: 2025-05-05
グループベースの堅牢性
トリガーの検知
敵対的学習

How to Backdoor the Knowledge Distillation

Authors: Chen Wu, Qian Ma, Prasenjit Mitra, Sencun Zhu | Published: 2025-04-30
バックドア攻撃
敵対的学習
知識蒸留の脆弱性

GIFDL: Generated Image Fluctuation Distortion Learning for Enhancing Steganographic Security

Authors: Xiangkun Wang, Kejiang Chen, Yuang Qi, Ruiheng Liu, Weiming Zhang, Nenghai Yu | Published: 2025-04-21
敵対的学習
生成モデル
透かし技術

Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent

Authors: Philip Doldo, Derek Everett, Amol Khanna, Andre T Nguyen, Edward Raff | Published: 2025-03-25
敵対的サンプルの脆弱性
敵対的学習
深層ネットワークの堅牢性