Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

TOP Literature Database Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2512.08417

PDF

https://arxiv.org/pdf/2512.08417

Paper Information

Author: Yinan Zhong,Qianhao Miao,Yanjiao Chen,Jiangyi Deng,Yushi Cheng,Wenyuan Xu
Published: 12-9-2025
Affiliation: Zhejiang University
Country: China
Conference

Labels Estimated by AI

Large Language Model Indirect Prompt Injection Prompt validation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.