Design Patterns for Securing LLM Agents against Prompt Injections

TOP Literature Database Design Patterns for Securing LLM Agents against Prompt Injections

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2506.08837

PDF

https://arxiv.org/pdf/2506.08837

Paper Information

Author: Luca Beurer-Kellner,Beat Buesser Ana-Maria Creţu,Edoardo Debenedetti,Daniel Dobos,Daniel Fabian,Marc Fischer,David Froelicher,Kathrin Grosse,Daniel Naeff,Ezinwanne Ozoani,Andrew Paverd,Florian Tramèr,Václav Volhejn
Published: 6-10-2025
Updated: 6-11-2025
Affiliation: Invariant Labs
Country: Switzerland
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Prompt Injection Defense Method Indirect Prompt Injection

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.