Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

TOP 文献データベース Casting a SPELL: Sentence Pairing Exploration for LLM Limitation-breaking

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2512.21236

PDF

https://arxiv.org/pdf/2512.21236

文献情報

作者: Yifan Huang,Xiaojun Jia,Wenbo Guo,Yuqiang Sun,Yihao Huang,Chong Wang,Yang Liu
公開日: 2025-12-26
所属機関: Nanyang Technological University
所属の国: Singapore
会議名

AIにより推定されたラベル

敵対的攻撃検出プロンプトインジェクションデータ選択戦略

Abstract

Large language models (LLMs) have revolutionized software development through AI-assisted coding tools, enabling developers with limited programming expertise to create sophisticated applications. However, this accessibility extends to malicious actors who may exploit these powerful tools to generate harmful software. Existing jailbreaking research primarily focuses on general attack scenarios against LLMs, with limited exploration of malicious code generation as a jailbreak target. To address this gap, we propose SPELL, a comprehensive testing framework specifically designed to evaluate the weakness of security alignment in malicious code generation. Our framework employs a time-division selection strategy that systematically constructs jailbreaking prompts by intelligently combining sentences from a prior knowledge dataset, balancing exploration of novel attack patterns with exploitation of successful techniques. Extensive evaluation across three advanced code models (GPT-4.1, Claude-3.5, and Qwen2.5-Coder) demonstrates SPELL's effectiveness, achieving attack success rates of 83.75%, 19.38%, and 68.12% respectively across eight malicious code categories. The generated prompts successfully produce malicious code in real-world AI development tools such as Cursor, with outputs confirmed as malicious by state-of-the-art detection systems at rates exceeding 73%. These findings reveal significant security gaps in current LLM implementations and provide valuable insights for improving AI safety alignment in code generation applications.