AIセキュリティポータル K Program
AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization
Share
Abstract
Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged execution exposes them to severe security risks, particularly direct and indirect prompt injection. Existing defenses face significant challenges in balancing security with utility, often encountering a trade-off where rigorous protection leads to over-defense, or where subtle indirect injections bypass detection. Drawing inspiration from operating system virtualization, we propose AgentVisor, a novel defense framework that enforces semantic privilege separation. AgentVisor treats the target agent as an untrusted guest and intercepts tool calls via a trusted semantic visor. Central to our approach is a rigorous audit protocol grounded in classic OS security primitives, designed to systematically mitigate both direct and indirect injection attacks. Furthermore, we introduce a one-shot self-correction mechanism that transforms security violations into constructive feedback, enabling agents to recover from attacks. Extensive experiments show that AgentVisor reduces the attack success rate to 0.65%, achieving this strong defense while incurring only a 1.45% average decrease in utility relative to the No Defense scenario, demonstrating superior performance compared to existing defense methods.
Secure computer system: Unified exposition and multics interpretation
David E Bell, Leonard J La Padula
Published: 1976
Don’t you (forget nlp): Prompt injection with control characters in chatgpt
Mark Breitenbach, Adrian Wood, Win Suen, Po-Ning Tseng
Published: 2023
Defending Against Prompt Injection With a Few DefensiveTokens
Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner
Published: 7.11.2025
SecAlign: Defending Against Prompt Injection with Preference Optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, David Wagner, Chuan Guo
Published: 10.8.2024
Agentdojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr
Published: 2024
Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, Emre Kiciman
Published: 3.21.2024
Tptu v2: Boosting task planning and tool usage of large language model-based agents in real-world industry systems
Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shi Shiwei, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, 1 others
Published: 2024
Protecting cloud virtual machines from hypervisor and host operating system exploits
Shih-Wei Li, John S Koh, Jason Nieh
Published: 2019
Formalizing and benchmarking prompt injection attacks and defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong
Published: 2024
Datasentinel: A game-theoretic detection of prompt injection attacks
Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong
Published: 2025
Formal requirements for virtualizable third generation architectures
Gerald J Popek, Robert P Goldberg
Published: 1974
Fine-tuned deberta-v3-base for prompt injection detection
ProtectAI.com
Published: 2024
The protection of information in computer systems
Jerome H Saltzer, Michael D Schroeder
Published: 1975
Optimization-based prompt injection attack to llm-as-a-judge
Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong
Published: 2024
Bitvisor: a thin hypervisor for enforcing i/o device security
Takahiro Shinagawa, Hideki Eiraku, Kouichi Tanimoto, Kazumasa Omote, Shoichi Hasegawa, Takashi Horie, Manabu Hirano, Kenichi Kourai, Yoshihiro Oyama, Eiji Kawai, 1 others
Published: 2009
Manipulating multimodal agents via cross-modal prompt injection
Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, Xianglong Liu
Published: 2025
Llm agents making agent tools
Georg Wölflein, Dyke Ferber, Daniel Truhn, Ognjen Arandjelovic, Jakob Nikolas Kather
Published: 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu
Published: 12.21.2023
Easytool: Enhancing llm-based agents with concise tool instruction
Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Kan Ren, Dongsheng Li, Deqing Yang
Published: 2025
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt Fredrikson
Published: 7.28.2023
Share