These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With the rapid advancement of multimodal large language models (MLLMs),
concerns regarding their security have increasingly captured the attention of
both academia and industry. Although MLLMs are vulnerable to jailbreak attacks,
designing effective jailbreak attacks poses unique challenges, especially given
the highly constrained adversarial capabilities in real-world deployment
scenarios. Previous works concentrate risks into a single modality, resulting
in limited jailbreak performance. In this paper, we propose a heuristic-induced
multimodal risk distribution jailbreak attack method, called HIMRD, which is
black-box and consists of two elements: multimodal risk distribution strategy
and heuristic-induced search strategy. The multimodal risk distribution
strategy is used to distribute harmful semantics into multiple modalities to
effectively circumvent the single-modality protection mechanisms of MLLMs. The
heuristic-induced search strategy identifies two types of prompts: the
understanding-enhancing prompt, which helps MLLMs reconstruct the malicious
prompt, and the inducing prompt, which increases the likelihood of affirmative
outputs over refusals, enabling a successful jailbreak attack. HIMRD achieves
an average attack success rate (ASR) of 90% across seven open-source MLLMs and
an average ASR of around 68% in three closed-source MLLMs. HIMRD reveals
cross-modal security vulnerabilities in current MLLMs and underscores the
imperative for developing defensive strategies to mitigate such emerging risks.
Code is available at https://github.com/MaTengSYSU/HIMRD-jailbreak.