These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large Language Models (LLMs) have become integral to automated code analysis,
enabling tasks such as vulnerability detection and code comprehension. However,
their integration introduces novel attack surfaces. In this paper, we identify
and investigate a new class of prompt-based attacks, termed Copy-Guided Attacks
(CGA), which exploit the inherent copying tendencies of reasoning-capable LLMs.
By injecting carefully crafted triggers into external code snippets,
adversaries can induce the model to replicate malicious content during
inference. This behavior enables two classes of vulnerabilities: inference
length manipulation, where the model generates abnormally short or excessively
long reasoning traces; and inference result manipulation, where the model
produces misleading or incorrect conclusions. We formalize CGA as an
optimization problem and propose a gradient-based approach to synthesize
effective triggers. Empirical evaluation on state-of-the-art reasoning LLMs
shows that CGA reliably induces infinite loops, premature termination, false
refusals, and semantic distortions in code analysis tasks. While highly
effective in targeted settings, we observe challenges in generalizing CGA
across diverse prompts due to computational constraints, posing an open
question for future research. Our findings expose a critical yet underexplored
vulnerability in LLM-powered development pipelines and call for urgent advances
in prompt-level defense mechanisms.