These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
This paper introduces a new method for adversarial attacks on large language
models (LLMs) called the Single-Turn Crescendo Attack (STCA). Building on the
multi-turn crescendo attack method introduced by Russinovich, Salem, and Eldan
(2024), which gradually escalates the context to provoke harmful responses, the
STCA achieves similar outcomes in a single interaction. By condensing the
escalation into a single, well-crafted prompt, the STCA bypasses typical
moderation filters that LLMs use to prevent inappropriate outputs. This
technique reveals vulnerabilities in current LLMs and emphasizes the importance
of stronger safeguards in responsible AI (RAI). The STCA offers a novel method
that has not been previously explored.