MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

TOP Literature Database MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2508.13048

PDF

https://arxiv.org/pdf/2508.13048

Paper Information

Author: Weiwei Qi,Shuo Shao,Wei Gu,Tianhang Zheng,Puning Zhao,Zhan Qin,Kui Ren
Published: 8-19-2025
Affiliation: The State Key Laboratory of Blockchain and Data Security, Zhejiang University
Country: China
Conference: AAAI Conference on Artificial Intelligence (AAAI)

Labels Estimated by AI

Prompt Injection Algorithm Design Attack Type

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) have exhibited remarkable capabilities but remain vulnerable to jailbreaking attacks, which can elicit harmful content from the models by manipulating the input prompts. Existing black-box jailbreaking techniques primarily rely on static prompts crafted with a single, non-adaptive strategy, or employ rigid combinations of several underperforming attack methods, which limits their adaptability and generalization. To address these limitations, we propose MAJIC, a Markovian adaptive jailbreaking framework that attacks black-box LLMs by iteratively combining diverse innovative disguise strategies. MAJIC first establishes a ``Disguise Strategy Pool'' by refining existing strategies and introducing several innovative approaches. To further improve the attack performance and efficiency, MAJIC formulate the sequential selection and fusion of strategies in the pool as a Markov chain. Under this formulation, MAJIC initializes and employs a Markov matrix to guide the strategy composition, where transition probabilities between strategies are dynamically adapted based on attack outcomes, thereby enabling MAJIC to learn and discover effective attack pathways tailored to the target model. Our empirical results demonstrate that MAJIC significantly outperforms existing jailbreak methods on prominent models such as GPT-4o and Gemini-2.0-flash, achieving over 90\% attack success rate with fewer than 15 queries per attempt on average.

External Datasets

StrongReject

HarmBench

AdvBench