These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Content Warning: This paper may contain unsafe or harmful content generated
by LLMs that may be offensive to readers. Large Language Models (LLMs) are
extensively used as tooling platforms through structured output APIs to ensure
syntax compliance so that robust integration with existing softwares like agent
systems, could be achieved. However, the feature enabling functionality of
grammar-guided structured output presents significant security vulnerabilities.
In this work, we reveal a critical control-plane attack surface orthogonal to
traditional data-plane vulnerabilities. We introduce Constrained Decoding
Attack (CDA), a novel jailbreak class that weaponizes structured output
constraints to bypass safety mechanisms. Unlike prior attacks focused on input
prompts, CDA operates by embedding malicious intent in schema-level grammar
rules (control-plane) while maintaining benign surface prompts (data-plane). We
instantiate this with a proof-of-concept Chain Enum Attack, achieves 96.2%
attack success rates across proprietary and open-weight LLMs on five safety
benchmarks with a single query, including GPT-4o and Gemini-2.0-flash. Our
findings identify a critical security blind spot in current LLM architectures
and urge a paradigm shift in LLM safety to address control-plane
vulnerabilities, as current mechanisms focused solely on data-plane threats
leave critical systems exposed.