These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
As Large Language Models (LLMs) grow increasingly powerful, multi-agent
systems are becoming more prevalent in modern AI applications. Most safety
research, however, has focused on vulnerabilities in single-agent LLMs. These
include prompt injection attacks, where malicious prompts embedded in external
content trick the LLM into executing unintended or harmful actions,
compromising the victim's application. In this paper, we reveal a more
dangerous vector: LLM-to-LLM prompt injection within multi-agent systems. We
introduce Prompt Infection, a novel attack where malicious prompts
self-replicate across interconnected agents, behaving much like a computer
virus. This attack poses severe threats, including data theft, scams,
misinformation, and system-wide disruption, all while propagating silently
through the system. Our extensive experiments demonstrate that multi-agent
systems are highly susceptible, even when agents do not publicly share all
communications. To address this, we propose LLM Tagging, a defense mechanism
that, when combined with existing safeguards, significantly mitigates infection
spread. This work underscores the urgent need for advanced security measures as
multi-agent LLM systems become more widely adopted.
External Datasets
120 user instructions
360 unique pairs of user instructions and attack phrases
synthetic user data (names, occupations, email addresses, phone numbers)