These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As AI agents powered by Large Language Models (LLMs) become increasingly
versatile and capable of addressing a broad spectrum of tasks, ensuring their
security has become a critical challenge. Among the most pressing threats are
prompt injection attacks, which exploit the agent's resilience on natural
language inputs -- an especially dangerous threat when agents are granted tool
access or handle sensitive information. In this work, we propose a set of
principled design patterns for building AI agents with provable resistance to
prompt injection. We systematically analyze these patterns, discuss their
trade-offs in terms of utility and security, and illustrate their real-world
applicability through a series of case studies.