These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Models (LLMs) are increasingly deployed in agentic systems
that interact with an untrusted environment. However, LLM agents are vulnerable
to prompt injection attacks when handling untrusted data. In this paper we
propose CaMeL, a robust defense that creates a protective system layer around
the LLM, securing it even when underlying models are susceptible to attacks. To
operate, CaMeL explicitly extracts the control and data flows from the
(trusted) query; therefore, the untrusted data retrieved by the LLM can never
impact the program flow. To further improve security, CaMeL uses a notion of a
capability to prevent the exfiltration of private data over unauthorized data
flows by enforcing security policies when tools are called. We demonstrate
effectiveness of CaMeL by solving $77\%$ of tasks with provable security
(compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL
at https://github.com/google-research/camel-prompt-injection.