Defeating Prompt Injections by Design

TOP Literature Database Defeating Prompt Injections by Design

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2503.18813

PDF

https://arxiv.org/pdf/2503.18813

Paper Information

Author: Edoardo Debenedetti,Ilia Shumailov,Tianqi Fan,Jamie Hayes,Nicholas Carlini,Daniel Fabian,Christoph Kern,Chongyang Shi,Andreas Terzis,Florian Tramèr
Published: 3-25-2025
Updated: 6-24-2025
Affiliation: Google
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Indirect Prompt Injection Prompt Injection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models are susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows by enforcing security policies when tools are called. We demonstrate effectiveness of CaMeL by solving $77\%$ of tasks with provable security (compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL at https://github.com/google-research/camel-prompt-injection.

External Datasets

AgentDojo