Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)

TOP Literature Database Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2409.03131

PDF

https://arxiv.org/pdf/2409.03131

Paper Information

Author: Alan Aqrawi;Arian Abbasi
Published: 9-5-2024
Updated: 9-11-2024
Affiliation: University of Cologne
Country: Germany
Conference

Labels Estimated by AI

Content Moderation Attack Method LLM Security

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

This paper introduces a new method for adversarial attacks on large language models (LLMs) called the Single-Turn Crescendo Attack (STCA). Building on the multi-turn crescendo attack method introduced by Russinovich, Salem, and Eldan (2024), which gradually escalates the context to provoke harmful responses, the STCA achieves similar outcomes in a single interaction. By condensing the escalation into a single, well-crafted prompt, the STCA bypasses typical moderation filters that LLMs use to prevent inappropriate outputs. This technique reveals vulnerabilities in current LLMs and emphasizes the importance of stronger safeguards in responsible AI (RAI). The STCA offers a novel method that has not been previously explored.