Test-Time Backdoor Attacks on Multimodal Large Language Models

TOP 文献データベース Test-Time Backdoor Attacks on Multimodal Large Language Models

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2402.08577

PDF

https://arxiv.org/pdf/2402.08577

文献情報

作者: Dong Lu;Tianyu Pang;Chao Du;Qian Liu;Xianjun Yang;Min Lin
公開日: 2024-2-14
所属機関: Southern University of Science and Technology
所属の国: China
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

バックドア攻撃攻撃手法モデル性能評価

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Backdoor attacks are commonly executed by contaminating training data, such that a trigger can activate predetermined harmful effects during the test phase. In this work, we present AnyDoor, a test-time backdoor attack against multimodal large language models (MLLMs), which involves injecting the backdoor into the textual modality using adversarial test images (sharing the same universal perturbation), without requiring access to or modification of the training data. AnyDoor employs similar techniques used in universal adversarial attacks, but distinguishes itself by its ability to decouple the timing of setup and activation of harmful effects. In our experiments, we validate the effectiveness of AnyDoor against popular MLLMs such as LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2, as well as provide comprehensive ablation studies. Notably, because the backdoor is injected by a universal perturbation, AnyDoor can dynamically change its backdoor trigger prompts/harmful effects, exposing a new challenge for defending against backdoor attacks. Our project page is available at https://sail-sg.github.io/AnyDoor/.

外部データセット

VQAv2

SVIT

DALL-E