These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Models (LLMs) have exhibited remarkable capabilities but
remain vulnerable to jailbreaking attacks, which can elicit harmful content
from the models by manipulating the input prompts. Existing black-box
jailbreaking techniques primarily rely on static prompts crafted with a single,
non-adaptive strategy, or employ rigid combinations of several underperforming
attack methods, which limits their adaptability and generalization. To address
these limitations, we propose MAJIC, a Markovian adaptive jailbreaking
framework that attacks black-box LLMs by iteratively combining diverse
innovative disguise strategies. MAJIC first establishes a ``Disguise Strategy
Pool'' by refining existing strategies and introducing several innovative
approaches. To further improve the attack performance and efficiency, MAJIC
formulate the sequential selection and fusion of strategies in the pool as a
Markov chain. Under this formulation, MAJIC initializes and employs a Markov
matrix to guide the strategy composition, where transition probabilities
between strategies are dynamically adapted based on attack outcomes, thereby
enabling MAJIC to learn and discover effective attack pathways tailored to the
target model. Our empirical results demonstrate that MAJIC significantly
outperforms existing jailbreak methods on prominent models such as GPT-4o and
Gemini-2.0-flash, achieving over 90\% attack success rate with fewer than 15
queries per attempt on average.