Liuji Chen,Hao Gao,Jinghao Zhang,Qiang Liu,Shu Wu,Liang Wang
Published
4-7-2025
Affiliation
New Laboratory of Pattern Recognition (NLPR), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences
These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Tool learning serves as a powerful auxiliary mechanism that extends the
capabilities of large language models (LLMs), enabling them to tackle complex
tasks requiring real-time relevance or high precision operations. Behind its
powerful capabilities lie some potential security issues. However, previous
work has primarily focused on how to make the output of the invoked tools
incorrect or malicious, with little attention given to the manipulation of tool
selection. To fill this gap, we introduce, for the first time, a black-box
text-based attack that can significantly increase the probability of the target
tool being selected in this paper. We propose a two-level text perturbation
attack witha coarse-to-fine granularity, attacking the text at both the word
level and the character level. We conduct comprehensive experiments that
demonstrate the attacker only needs to make some perturbations to the tool's
textual information to significantly increase the possibility of the target
tool being selected and ranked higher among the candidate tools. Our research
reveals the vulnerability of the tool selection process and paves the way for
future research on protecting this process.