Defense Method

Thought Purity: Defense Paradigm For Chain-of-Thought Attack

Authors: Zihao Xue, Zhen Bi, Long Ma, Zhenlin Hu, Yan Wang, Zhenfang Liu, Qing Sheng, Jie Xiao, Jungang Lou | Published: 2025-07-16
Information Security
Threat modeling
Defense Method

Defending Against Prompt Injection With a Few DefensiveTokens

Authors: Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner | Published: 2025-07-10
Indirect Prompt Injection
Prompt leaking
Defense Method

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks

Authors: Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes | Published: 2025-07-10
Indirect Prompt Injection
Adversarial attack
Defense Method

iThermTroj: Exploiting Intermittent Thermal Trojans in Multi-Processor System-on-Chips

Authors: Mehdi Elahi, Mohamed R. Elshamy, Abdel-Hameed Badawy, Ahmad Patooghy | Published: 2025-07-08
Hardware Trojan Detection
Threat Model
Defense Method

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Authors: Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin | Published: 2025-06-11
Indirect Prompt Injection
Prompt Injection
Defense Method

Design Patterns for Securing LLM Agents against Prompt Injections

Authors: Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn | Published: 2025-06-10 | Updated: 2025-06-11
Indirect Prompt Injection
Prompt Injection
Defense Method

Your Agent Can Defend Itself against Backdoor Attacks

Authors: Li Changjiang, Liang Jiacheng, Cao Bochuan, Chen Jinghui, Wang Ting | Published: 2025-06-10 | Updated: 2025-06-11
Poisoning attack on RAG
Backdoor Attack Mitigation
Defense Method

TokenBreak: Bypassing Text Classification Models Through Token Manipulation

Authors: Kasimir Schulz, Kenneth Yeung, Kieran Evans | Published: 2025-06-09
Adversarial Attack Methods
Defense Method

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models

Authors: Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong | Published: 2025-06-03
Model Extraction Attack
Detection of Model Extraction Attacks
Defense Method

DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

Authors: Yi Wang, Fenghua Weng, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang | Published: 2025-02-17 | Updated: 2025-05-29
LLM Security
Prompt Injection
Defense Method