BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

TOP 文献データベース BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2503.16023

PDF

https://arxiv.org/pdf/2503.16023

文献情報

作者: Zenghui Yuan,Jiawen Shi,Pan Zhou,Neil Zhenqiang Gong,Lichao Sun
公開日: 2025-3-20
所属機関: Hubei Key Laboratory of Distributed System Security
所属の国: China
会議名: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

AIにより推定されたラベル

バックドア攻撃プロンプトインジェクション大規模言語モデル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Multi-modal large language models (MLLMs) extend large language models (LLMs) to process multi-modal information, enabling them to generate responses to image-text inputs. MLLMs have been incorporated into diverse multi-modal applications, such as autonomous driving and medical diagnosis, via plug-and-play without fine-tuning. This deployment paradigm increases the vulnerability of MLLMs to backdoor attacks. However, existing backdoor attacks against MLLMs achieve limited effectiveness and stealthiness. In this work, we propose BadToken, the first token-level backdoor attack to MLLMs. BadToken introduces two novel backdoor behaviors: Token-substitution and Token-addition, which enable flexible and stealthy attacks by making token-level modifications to the original output for backdoored inputs. We formulate a general optimization problem that considers the two backdoor behaviors to maximize the attack effectiveness. We evaluate BadToken on two open-source MLLMs and various tasks. Our results show that our attack maintains the model's utility while achieving high attack success rates and stealthiness. We also show the real-world threats of BadToken in two scenarios, i.e., autonomous driving and medical diagnosis. Furthermore, we consider defenses including fine-tuning and input purification. Our results highlight the threat of our attack.

外部データセット

MSCOCO

VQAv2