These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Prompt injection attacks represent a major vulnerability in Large Language
Model (LLM) deployments, where malicious instructions embedded in user inputs
can override system prompts and induce unintended behaviors. This paper
presents a novel multi-agent defense framework that employs specialized LLM
agents in coordinated pipelines to detect and neutralize prompt injection
attacks in real-time. We evaluate our approach using two distinct
architectures: a sequential chain-of-agents pipeline and a hierarchical
coordinator-based system. Our comprehensive evaluation on 55 unique prompt
injection attacks, grouped into 8 categories and totaling 400 attack instances
across two LLM platforms (ChatGLM and Llama2), demonstrates significant
security improvements. Without defense mechanisms, baseline Attack Success
Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent
pipeline achieved 100% mitigation, reducing ASR to 0% across all tested
scenarios. The framework demonstrates robustness across multiple attack
categories including direct overrides, code execution attempts, data
exfiltration, and obfuscation techniques, while maintaining system
functionality for legitimate queries.