Test-Time Backdoor Attacks on Multimodal Large Language Models

TOP Literature Database Test-Time Backdoor Attacks on Multimodal Large Language Models

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2402.08577

PDF

https://arxiv.org/pdf/2402.08577

Paper Information

Author: Dong Lu;Tianyu Pang;Chao Du;Qian Liu;Xianjun Yang;Min Lin
Published: 2-14-2024
Affiliation: Southern University of Science and Technology
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Backdoor Attack Attack Method Model Performance Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Backdoor attacks are commonly executed by contaminating training data, such that a trigger can activate predetermined harmful effects during the test phase. In this work, we present AnyDoor, a test-time backdoor attack against multimodal large language models (MLLMs), which involves injecting the backdoor into the textual modality using adversarial test images (sharing the same universal perturbation), without requiring access to or modification of the training data. AnyDoor employs similar techniques used in universal adversarial attacks, but distinguishes itself by its ability to decouple the timing of setup and activation of harmful effects. In our experiments, we validate the effectiveness of AnyDoor against popular MLLMs such as LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2, as well as provide comprehensive ablation studies. Notably, because the backdoor is injected by a universal perturbation, AnyDoor can dynamically change its backdoor trigger prompts/harmful effects, exposing a new challenge for defending against backdoor attacks. Our project page is available at https://sail-sg.github.io/AnyDoor/.

External Datasets

VQAv2

SVIT

DALL-E