防御手法 | ページ 5 | AIセキュリティポータル

OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack

Authors: Kuo Gai, Sicong Wang, Shihua Zhang | Published: 2024-08-01

敵対的訓練

最適化問題

防御手法

2024.08.01 2025.04.03

文献データベース

Variational Randomized Smoothing for Sample-Wise Adversarial Robustness

Authors: Ryo Hase, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons | Published: 2024-07-16

正則化

透かしの耐久性

防御手法

2024.07.16 2025.04.03

文献データベース

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Authors: Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr | Published: 2024-06-12

LLMセキュリティ

プロンプトインジェクション

防御手法

2024.06.12 2025.04.03

文献データベース

A Study of Backdoors in Instruction Fine-tuned Language Models

Authors: Jayaram Raghuram, George Kesidis, David J. Miller | Published: 2024-06-12 | Updated: 2024-08-21

LLMセキュリティ

バックドア攻撃

防御手法

2024.06.12 2025.04.03

文献データベース

AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

Authors: Lin Lu, Hai Yan, Zenghui Yuan, Jiawen Shi, Wenqi Wei, Pin-Yu Chen, Pan Zhou | Published: 2024-06-06

LLM性能評価

プロンプトインジェクション

防御手法

2024.06.06 2025.04.03

文献データベース

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

Authors: Frank Weizhen Liu, Chenhui Hu | Published: 2024-06-01

LLMセキュリティ

プロンプトインジェクション

防御手法

2024.06.01 2025.04.03

文献データベース

ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning

Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Jinyuan Jia, Bo Li, Radha Poovendran | Published: 2024-05-31 | Updated: 2024-06-05

ポイズニング

評価手法

防御手法

2024.05.31 2025.04.03

文献データベース

Cross-Task Defense: Instruction-Tuning LLMs for Content Safety

Authors: Yu Fu, Wen Xiao, Jia Chen, Jiachen Li, Evangelos Papalexakis, Aichi Chien, Yue Dong | Published: 2024-05-24

コンテンツモデレーション

プロンプトインジェクション

防御手法

2024.05.24 2025.04.03

文献データベース

A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure

Authors: Wei Sun, Bo Gao, Ke Xiong, Yuwei Wang | Published: 2024-05-19 | Updated: 2024-05-21

バックドア攻撃

ポイズニング

防御手法

2024.05.19 2025.04.03

文献データベース

Dealing Doubt: Unveiling Threat Models in Gradient Inversion Attacks under Federated Learning, A Survey and Taxonomy

Authors: Yichuan Shi, Olivera Kotevska, Viktor Reshniak, Abhishek Singh, Ramesh Raskar | Published: 2024-05-16

ポイズニング

攻撃手法

防御手法

2024.05.16 2025.04.03

文献データベース