May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks Authors: Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes | Published: 2025-07-10 Indirect Prompt InjectionAdversarial attackDefense Method 2025.07.10 2025.07.12 Literature Database
BarkBeetle: Stealing Decision Tree Models with Fault Injection Authors: Qifan Wang, Jonas Sander, Minmin Jiang, Thomas Eisenbarth, David Oswald | Published: 2025-07-09 Model Extraction AttackAdversarial attackFeature Selection Method 2025.07.09 2025.07.11 Literature Database
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations Authors: Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian | Published: 2025-07-08 Prompt InjectionAdversarial attackDefense Effectiveness Analysis 2025.07.08 2025.07.10 Literature Database
The Hidden Threat in Plain Text: Attacking RAG Data Loaders Authors: Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi | Published: 2025-07-07 Poisoning attack on RAGLarge Language ModelAdversarial attack 2025.07.07 2025.07.09 Literature Database
Amplifying Machine Learning Attacks Through Strategic Compositions Authors: Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang | Published: 2025-06-23 Membership Disclosure RiskCertified RobustnessAdversarial attack 2025.06.23 2025.06.25 Literature Database
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji | Published: 2025-06-11 Disabling Safety Mechanisms of LLMPrompt InjectionAdversarial attack 2025.06.11 2025.06.13 Literature Database
A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning Authors: Greg Gluch, Shafi Goldwasser | Published: 2025-04-28 | Updated: 2025-07-10 Certified RobustnessAdversarial attackComputational Problem 2025.04.28 2025.07.12 Literature Database
Support is All You Need for Certified VAE Training Authors: Changming Xu, Debangshu Banerjee, Deepak Vasisht, Gagandeep Singh | Published: 2025-04-16 Improvement of LearningAdversarial attackWatermark Design 2025.04.16 2025.05.27 Literature Database
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On Authors: Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot | Published: 2025-03-21 | Updated: 2025-03-25 RAGMembership Disclosure RiskAdversarial attack 2025.03.21 2025.05.27 Literature Database
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi | Published: 2024-11-27 | Updated: 2025-03-20 Prompt InjectionSafety AlignmentAdversarial attack 2024.11.27 2025.05.27 Literature Database