AIセキュリティポータル K Program
PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections
Share
Abstract
Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses mainly focus on blocking malicious content at inference time, and current red-teaming methods primarily optimize attack success. As a result, developers have limited visibility into how latent prompt injections emerge and propagate through agents. We propose PI-Hunter, an automated agentic auditing framework for proactive vulnerability exposure in LLM agents. PI-Hunter constructs realistic source-aware test cases and iteratively evolves them through feedback-driven exploration to induce agents to retrieve and reveal latent malicious instructions embedded within external environments. Extensive experiments across multiple benchmarks, agent architectures, attacks, and defenses demonstrate that PI-Hunter substantially improves vulnerability exposure and attack-surface coverage over strong automated red-teaming baselines, while remaining effective under existing prompt injection defenses.
SecAlign: Defending Against Prompt Injection with Preference Optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo
Published: 2024.10.8
Agentdojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr
Published: 2024
Pear: Planner-executor agent robustness benchmark
Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing
Published: 2026
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
Published: 2023.2.24
A real-world webagent with planning, long context understanding, and program synthesis
I. Gur, H. Furuta, A. Huang, M. Safdari, Y. Matsuo, D. Eck, A. Faust
Published: 2024
Advancing reasoning with off-the-shelf llms: A semantic structure perspective
P. He, Z. Li, Y. Xing, Y. Li, J. Tang, B. Ding
Published: 2025
Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman
Published: 2024.3.21
Piguard: Prompt injection guardrail via mitigating overdefense for free
Hao Li, Xiaogeng Liu, Ning Zhang, Chaowei Xiao
Published: 2025
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang Liu
Published: 2023.6.9
Formalizing and benchmarking prompt injection attacks and defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong
Published: 2024
Automated red teaming with goat: the generative offensive agent tester
M. Pavlova, E. Brinkman, K. Iyer, V. Albiero, J. Bitton, H. Nguyen, C. C. Ferrer, I. Evtimov, A. Grattafiori
Published: 2025
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom
Published: 2023
Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records
W. Shi, R. Xu, Y. Zhuang, Y. Yu, J. Zhang, H. Wu, Y. Zhu, J. C. Ho, C. Yang, M. D. Wang
Published: 2024
Reflexion: Language agents with verbal reinforcement learning
N. Shinn, F. Cassano, A. Berman, A. Gopinath, K. Narasimhan, S. Yao
Published: 2023
AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents
Zhun Wang, Vincent Siu, Zhe Ye, Tianneng Shi, Yuzhou Nie, Xuandong Zhao, Chenguang Wang, Wenbo Guo, Dawn Song
Published: 2025.5.9
Autogen: Enabling next-gen llm applications via multi-agent conversations
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu
Published: 2024
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, Yuan Cao
Published: 2022
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang
Published: 2024.3.5
Safesearch: Do not trade safety for utility in llm search agents
Q. Zhan, A. Budiman-Chan, A. Zayed, X. Guo, D. Kang, J.-K. Kim
Published: 2026
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification
Tian Zhang, Yiwei Xu, Juan Wang, Keyan Guo, Xiaoyang Xu, Bowen Xiao, Quanlong Guan, Jinlin Fan, Jiawei Liu, Zhiquan Liu, Hongxin Hu
Published: 2026.2.26
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou, Kevin Wu, Francesco Pinto, Zhaorun Chen, Yi Zeng, Yu Yang, Shuang Yang, Sanmi Koyejo, James Zou, Bo Li
Published: 2025.3.20
Siraj: Diverse and efficient red-teaming for llm agents via distilled structured reasoning
K. Zhou, A. Elgohary, A. Iftekhar, A. Saied
Published: 2026
Melon: Provable defense against indirect prompt injection attacks in ai agents
K. Zhu, X. Yang, J. Wang, W. Guo, W. Y. Wang
Published: 2025
Share