Adversarial attack

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks

Authors: Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes | Published: 2025-07-10

Indirect Prompt Injection

Adversarial attack

Defense Method

2025.07.10 2025.07.12

Literature Database

BarkBeetle: Stealing Decision Tree Models with Fault Injection

Authors: Qifan Wang, Jonas Sander, Minmin Jiang, Thomas Eisenbarth, David Oswald | Published: 2025-07-09

Model Extraction Attack

Adversarial attack

Feature Selection Method

2025.07.09 2025.07.11

Literature Database

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

Authors: Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian | Published: 2025-07-08

Prompt Injection

Adversarial attack

Defense Effectiveness Analysis

2025.07.08 2025.07.10

Literature Database

The Hidden Threat in Plain Text: Attacking RAG Data Loaders

Authors: Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi | Published: 2025-07-07

Poisoning attack on RAG

Large Language Model

Adversarial attack

2025.07.07 2025.07.09

Literature Database

Amplifying Machine Learning Attacks Through Strategic Compositions

Authors: Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang | Published: 2025-06-23

Membership Disclosure Risk

Certified Robustness

Adversarial attack

2025.06.23 2025.06.25

Literature Database

LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji | Published: 2025-06-11

Disabling Safety Mechanisms of LLM

Prompt Injection

Adversarial attack

2025.06.11 2025.06.13

Literature Database

A Cryptographic Perspective on Mitigation vs. Detection in Machine Learning

Authors: Greg Gluch, Shafi Goldwasser | Published: 2025-04-28 | Updated: 2025-07-10

Certified Robustness

Adversarial attack

Computational Problem

2025.04.28 2025.07.12

Literature Database

Support is All You Need for Certified VAE Training

Authors: Changming Xu, Debangshu Banerjee, Deepak Vasisht, Gagandeep Singh | Published: 2025-04-16

Improvement of Learning

Adversarial attack

Watermark Design

2025.04.16 2025.05.27

Literature Database

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

Authors: Ken Ziyu Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot | Published: 2025-03-21 | Updated: 2025-03-25

RAG

Membership Disclosure Risk

Adversarial attack

2025.03.21 2025.05.27

Literature Database

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi | Published: 2024-11-27 | Updated: 2025-03-20

Prompt Injection

Safety Alignment

Adversarial attack

2024.11.27 2025.05.27

Literature Database