When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs

TOP Literature Database When LLMs Copy to Think: Uncovering Copy-Guided Attacks in Reasoning LLMs

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2507.16773

PDF

https://arxiv.org/pdf/2507.16773

Paper Information

Author: Yue Li,Xiao Li,Hao Wu,Yue Zhang,Fengyuan Xu,Xiuzhen Cheng,Sheng Zhong
Published: 7-23-2025
Affiliation: National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, China
Country: China
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Prompt leaking Attack Method Model DoS

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Large Language Models (LLMs) have become integral to automated code analysis, enabling tasks such as vulnerability detection and code comprehension. However, their integration introduces novel attack surfaces. In this paper, we identify and investigate a new class of prompt-based attacks, termed Copy-Guided Attacks (CGA), which exploit the inherent copying tendencies of reasoning-capable LLMs. By injecting carefully crafted triggers into external code snippets, adversaries can induce the model to replicate malicious content during inference. This behavior enables two classes of vulnerabilities: inference length manipulation, where the model generates abnormally short or excessively long reasoning traces; and inference result manipulation, where the model produces misleading or incorrect conclusions. We formalize CGA as an optimization problem and propose a gradient-based approach to synthesize effective triggers. Empirical evaluation on state-of-the-art reasoning LLMs shows that CGA reliably induces infinite loops, premature termination, false refusals, and semantic distortions in code analysis tasks. While highly effective in targeted settings, we observe challenges in generalizing CGA across diverse prompts due to computational constraints, posing an open question for future research. Our findings expose a critical yet underexplored vulnerability in LLM-powered development pipelines and call for urgent advances in prompt-level defense mechanisms.