These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The Model Context Protocol (MCP) has emerged as a universal standard that
enables AI agents to seamlessly connect with external tools, significantly
enhancing their functionality. However, while MCP brings notable benefits, it
also introduces significant vulnerabilities, such as Tool Poisoning Attacks
(TPA), where hidden malicious instructions exploit the sycophancy of large
language models (LLMs) to manipulate agent behavior. Despite these risks,
current academic research on MCP security remains limited, with most studies
focusing on narrow or qualitative analyses that fail to capture the diversity
of real-world threats. To address this gap, we present the MCP Attack Library
(MCPLIB), which categorizes and implements 31 distinct attack methods under
four key classifications: direct tool injection, indirect tool injection,
malicious user attacks, and LLM inherent attack. We further conduct a
quantitative analysis of the efficacy of each attack. Our experiments reveal
key insights into MCP vulnerabilities, including agents' blind reliance on tool
descriptions, sensitivity to file-based attacks, chain attacks exploiting
shared context, and difficulty distinguishing external data from executable
commands. These insights, validated through attack experiments, underscore the
urgency for robust defense strategies and informed MCP design. Our
contributions include 1) constructing a comprehensive MCP attack taxonomy, 2)
introducing a unified attack framework MCPLIB, and 3) conducting empirical
vulnerability analysis to enhance MCP security mechanisms. This work provides a
foundational framework, supporting the secure evolution of MCP ecosystems.