May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks Authors: Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes | Published: 2025-07-10 インダイレクトプロンプトインジェクション敵対的攻撃防御手法 2025.07.10 文献データベース
BarkBeetle: Stealing Decision Tree Models with Fault Injection Authors: Qifan Wang, Jonas Sander, Minmin Jiang, Thomas Eisenbarth, David Oswald | Published: 2025-07-09 モデル抽出攻撃敵対的攻撃特徴選択手法 2025.07.09 文献データベース
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations Authors: Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian | Published: 2025-07-08 プロンプトインジェクション敵対的攻撃防御効果分析 2025.07.08 文献データベース
The Hidden Threat in Plain Text: Attacking RAG Data Loaders Authors: Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi | Published: 2025-07-07 RAGへのポイズニング攻撃大規模言語モデル敵対的攻撃 2025.07.07 文献データベース
Amplifying Machine Learning Attacks Through Strategic Compositions Authors: Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang | Published: 2025-06-23 メンバーシップ開示リスクモデルの頑健性保証敵対的攻撃 2025.06.23 文献データベース
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji | Published: 2025-06-11 LLMの安全機構の解除プロンプトインジェクション敵対的攻撃 2025.06.11 文献データベース
A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning Authors: Greg Gluch, Shafi Goldwasser | Published: 2025-04-28 | Updated: 2025-07-10 モデルの頑健性保証敵対的攻撃計算問題 2025.04.28 文献データベース
Support is All You Need for Certified VAE Training Authors: Changming Xu, Debangshu Banerjee, Deepak Vasisht, Gagandeep Singh | Published: 2025-04-16 学習の改善敵対的攻撃透かし設計 2025.04.16 文献データベース
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On Authors: Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot | Published: 2025-03-21 | Updated: 2025-03-25 RAGメンバーシップ開示リスク敵対的攻撃 2025.03.21 2025.04.03 文献データベース
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi | Published: 2024-11-27 | Updated: 2025-03-20 プロンプトインジェクション安全性アライメント敵対的攻撃 2024.11.27 2025.04.03 文献データベース