MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

TOP Literature Database MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2502.17832

PDF

https://arxiv.org/pdf/2502.17832

Paper Information

Author: Hyeonjeong Ha,Qiusi Zhan,Jeonghwan Kim,Dimitrios Bralios,Saikrishna Sanniboina,Nanyun Peng,Kai-Wei Chang,Daniel Kang,Heng Ji
Published: 2-25-2025
Updated: 10-8-2025
Affiliation: University of Illinois Urbana-Champaign
Country: United States of America
Conference

Labels Estimated by AI

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Multimodal large language models with Retrieval Augmented Generation (RAG) have significantly advanced tasks such as multimodal question answering by grounding responses in external text and images. This grounding improves factuality, reduces hallucination, and extends reasoning beyond parametric knowledge. However, this reliance on external knowledge poses a critical yet underexplored safety risk: knowledge poisoning attacks, where adversaries deliberately inject adversarial multimodal content into external knowledge bases to steer model toward generating incorrect or even harmful responses. To expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to systematically design knowledge poisoning in multimodal RAG. We introduce two complementary attack strategies: Localized Poisoning Attack (LPA), which implants targeted multimodal misinformation to manipulate specific queries, and Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge to broadly disrupt reasoning and induce nonsensical responses across all queries. Comprehensive experiments across tasks, models, and access settings show that LPA achieves targeted manipulation with attack success rates of up to 56%, while GPA completely disrupts model generation to 0% accuracy with just a single adversarial knowledge injection. Our results reveal the fragility of multimodal RAG and highlight the urgent need for defenses against knowledge poisoning.

External Datasets

MultimodalQA

WebQA