Misusing Tools in Large Language Models With Visual Adversarial Examples

TOP 文献データベース Misusing Tools in Large Language Models With Visual Adversarial Examples

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2310.03185

PDF

https://arxiv.org/pdf/2310.03185

文献情報

作者: Xiaohan Fu;Zihan Wang;Shuheng Li;Rajesh K. Gupta;Niloofar Mireshghallah;Taylor Berg-Kirkpatrick;Earlence Fernandes
公開日: 2023-10-5
所属機関: University of California San Diego
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

プロンプトインジェクション LLM性能評価敵対的サンプル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels. Different from prior work, our attacks can affect the confidentiality and integrity of user resources connected to the LLM while being stealthy and generalizable to multiple input prompts. We construct these attacks using gradient-based adversarial training and characterize performance along multiple dimensions. We find that our adversarial images can manipulate the LLM to invoke tools following real-world syntax almost always (~98%) while maintaining high similarity to clean images (~0.9 SSIM). Furthermore, using human scoring and automated metrics, we find that the attacks do not noticeably affect the conversation (and its semantics) between the user and the LLM.

外部データセット

Alpaca instruction dataset