These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The rapid advancement of Large Language Models (LLMs) has led to the
emergence of Multi-Agent Systems (MAS) to perform complex tasks through
collaboration. However, the intricate nature of MAS, including their
architecture and agent interactions, raises significant concerns regarding
intellectual property (IP) protection. In this paper, we introduce MASLEAK, a
novel attack framework designed to extract sensitive information from MAS
applications. MASLEAK targets a practical, black-box setting, where the
adversary has no prior knowledge of the MAS architecture or agent
configurations. The adversary can only interact with the MAS through its public
API, submitting attack query $q$ and observing outputs from the final agent.
Inspired by how computer worms propagate and infect vulnerable network hosts,
MASLEAK carefully crafts adversarial query $q$ to elicit, propagate, and retain
responses from each MAS agent that reveal a full set of proprietary components,
including the number of agents, system topology, system prompts, task
instructions, and tool usages. We construct the first synthetic dataset of MAS
applications with 810 applications and also evaluate MASLEAK against real-world
MAS applications, including Coze and CrewAI. MASLEAK achieves high accuracy in
extracting MAS IP, with an average attack success rate of 87% for system
prompts and task instructions, and 92% for system architecture in most cases.
We conclude by discussing the implications of our findings and the potential
defenses.