Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)

TOP 文献データベース Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2409.03131

PDF

https://arxiv.org/pdf/2409.03131

文献情報

作者: Alan Aqrawi;Arian Abbasi
公開日: 2024-9-5
更新日: 2024-9-11
所属機関: University of Cologne
所属の国: Germany
会議名

AIにより推定されたラベル

コンテンツモデレーション攻撃手法 LLMセキュリティ

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

This paper introduces a new method for adversarial attacks on large language models (LLMs) called the Single-Turn Crescendo Attack (STCA). Building on the multi-turn crescendo attack method introduced by Russinovich, Salem, and Eldan (2024), which gradually escalates the context to provoke harmful responses, the STCA achieves similar outcomes in a single interaction. By condensing the escalation into a single, well-crafted prompt, the STCA bypasses typical moderation filters that LLMs use to prevent inappropriate outputs. This technique reveals vulnerabilities in current LLMs and emphasizes the importance of stronger safeguards in responsible AI (RAI). The STCA offers a novel method that has not been previously explored.

参考文献

Journal of Artificial Intelligence

An Overview of AI and its Applications

John D. Doe

Published: 2020

Tech Books Publishing

Deep Learning Explained

Jane Smith

Published: 2019

ISO

ISO 9001:2015 Quality Management System

The International Organization for Standardization