On the Feasibility of Hijacking MLLMs’ Decision Chain via One Perturbation

Authors: Changyue Li, Jiaying Li, Youliang Yuan, Jiaming He, Zhicong Huang, Pinjia He | Published: 2025-11-25

2025.11.252025.11.27

Authors: Changyue Li, Jiaying Li, Youliang Yuan, Jiaming He, Zhicong Huang, Pinjia He
Published: 2025-11-25

Source: https://arxiv.org/abs/2511.20002

PDF: https://arxiv.org/pdf/2511.20002

Labels Predicted by AI

Adaptive Adversarial Training Robustness Improvement Method Image Processing

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Conventional adversarial attacks focus on manipulating a single decision of neural networks. However, real-world models often operate in a sequence of decisions, where an isolated mistake can be easily corrected, but cascading errors can lead to severe risks. This paper reveals a novel threat: a single perturbation can hijack the whole decision chain. We demonstrate the feasibility of manipulating a model’s outputs toward multiple, predefined outcomes, such as simultaneously misclassifying “non-motorized lane” signs as “motorized lane” and “pedestrian” as “plastic bag”. To expose this threat, we introduce Semantic-Aware Universal Perturbations (SAUPs), which induce varied outcomes based on the semantics of the inputs. We overcome optimization challenges by developing an effective algorithm, which searches for perturbations in normalized space with a semantic separation strategy. To evaluate the practical threat of SAUPs, we present RIST, a new real-world image dataset with fine-grained semantic annotations. Extensive experiments on three multimodal large language models demonstrate their vulnerability, achieving a 70