Paper Information
- Author
- Hanrong Zhang,Jingyuan Huang,Kai Mei,Yifei Yao,Zhenting Wang,Chenlu Zhan,Hongwei Wang,Yongfeng Zhang
- Published
- 10-4-2024
- Updated
- 4-16-2025
- Affiliation
- Zhejiang University
- Country
- China
- Conference
- International Conference on Learning Representations (ICLR)
Abstract
Although LLM-based agents, powered by Large Language Models (LLMs), can use
external tools and memory mechanisms to solve complex real-world tasks, they
may also introduce critical security vulnerabilities. However, the existing
literature does not comprehensively evaluate attacks and defenses against
LLM-based agents. To address this, we introduce Agent Security Bench (ASB), a
comprehensive framework designed to formalize, benchmark, and evaluate the
attacks and defenses of LLM-based agents, including 10 scenarios (e.g.,
e-commerce, autonomous driving, finance), 10 agents targeting the scenarios,
over 400 tools, 27 different types of attack/defense methods, and 7 evaluation
metrics. Based on ASB, we benchmark 10 prompt injection attacks, a memory
poisoning attack, a novel Plan-of-Thought backdoor attack, 4 mixed attacks, and
11 corresponding defenses across 13 LLM backbones. Our benchmark results reveal
critical vulnerabilities in different stages of agent operation, including
system prompt, user prompt handling, tool usage, and memory retrieval, with the
highest average attack success rate of 84.30\%, but limited effectiveness shown
in current defenses, unveiling important works to be done in terms of agent
security for the community. We also introduce a new metric to evaluate the
agents' capability to balance utility and security. Our code can be found at
https://github.com/agiresearch/ASB.