Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

TOP Literature Database Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2503.15547

PDF

https://arxiv.org/pdf/2503.15547

Paper Information

Author: Juhee Kim,Woohyuk Choi,Byoungyoung Lee
Published: 3-17-2025
Updated: 4-21-2025
Affiliation: Seoul National University
Country: South Korea
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Data Flow Analysis Indirect Prompt Injection Attack Method

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) are combined with tools to create powerful LLM agents that provide a wide range of services. Unlike traditional software, LLM agent's behavior is determined at runtime by natural language prompts from either user or tool's data. This flexibility enables a new computing paradigm with unlimited capabilities and programmability, but also introduces new security risks, vulnerable to privilege escalation attacks. Moreover, user prompts are prone to be interpreted in an insecure way by LLM agents, creating non-deterministic behaviors that can be exploited by attackers. To address these security risks, we propose Prompt Flow Integrity (PFI), a system security-oriented solution to prevent privilege escalation in LLM agents. Analyzing the architectural characteristics of LLM agents, PFI features three mitigation techniques -- i.e., agent isolation, secure untrusted data processing, and privilege escalation guardrails. Our evaluation result shows that PFI effectively mitigates privilege escalation attacks while successfully preserving the utility of LLM agents.

External Datasets

AgentDojo

AgentBench OS