Hyeonjeong Ha,Qiusi Zhan,Jeonghwan Kim,Dimitrios Bralios,Saikrishna Sanniboina,Nanyun Peng,Kai-Wei Chang,Daniel Kang,Heng Ji
Published
2-25-2025
Updated
10-8-2025
Affiliation
University of Illinois Urbana-Champaign
Country
United States of America
Conference
Labels Estimated by AI
These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Multimodal large language models with Retrieval Augmented Generation (RAG)
have significantly advanced tasks such as multimodal question answering by
grounding responses in external text and images. This grounding improves
factuality, reduces hallucination, and extends reasoning beyond parametric
knowledge. However, this reliance on external knowledge poses a critical yet
underexplored safety risk: knowledge poisoning attacks, where adversaries
deliberately inject adversarial multimodal content into external knowledge
bases to steer model toward generating incorrect or even harmful responses. To
expose such vulnerabilities, we propose MM-PoisonRAG, the first framework to
systematically design knowledge poisoning in multimodal RAG. We introduce two
complementary attack strategies: Localized Poisoning Attack (LPA), which
implants targeted multimodal misinformation to manipulate specific queries, and
Globalized Poisoning Attack (GPA), which inserts a single adversarial knowledge
to broadly disrupt reasoning and induce nonsensical responses across all
queries. Comprehensive experiments across tasks, models, and access settings
show that LPA achieves targeted manipulation with attack success rates of up to
56%, while GPA completely disrupts model generation to 0% accuracy with just a
single adversarial knowledge injection. Our results reveal the fragility of
multimodal RAG and highlight the urgent need for defenses against knowledge
poisoning.