These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Models (LLMs) have revolutionized natural language processing
but remain vulnerable to jailbreak attacks, especially multi-turn jailbreaks
that distribute malicious intent across benign exchanges and bypass alignment
mechanisms. Existing approaches often explore the adversarial space poorly,
rely on hand-crafted heuristics, or lack systematic query refinement. We
present NEXUS (Network Exploration for eXploiting Unsafe Sequences), a modular
framework for constructing, refining, and executing optimized multi-turn
attacks. NEXUS comprises: (1) ThoughtNet, which hierarchically expands a
harmful intent into a structured semantic network of topics, entities, and
query chains; (2) a feedback-driven Simulator that iteratively refines and
prunes these chains through attacker-victim-judge LLM collaboration using
harmfulness and semantic-similarity benchmarks; and (3) a Network Traverser
that adaptively navigates the refined query space for real-time attacks. This
pipeline uncovers stealthy, high-success adversarial paths across LLMs. On
several closed-source and open-source LLMs, NEXUS increases attack success rate
by 2.1% to 19.4% over prior methods. Code: https://github.com/inspire-lab/NEXUS