The Automation Advantage in AI Red Teaming

TOP 文献データベース The Automation Advantage in AI Red Teaming

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2504.19855

PDF

https://arxiv.org/pdf/2504.19855

文献情報

作者: Rob Mulla,Ads Dawson,Vincent Abruzzon,Brian Greunke,Nick Landers,Brad Palm,Will Pearce
公開日: 2025-4-28
更新日: 2025-4-29
所属機関: Dreadnode
所属の国: United States of America
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

プロンプトリーキング攻撃手法自動化の効果

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

This paper analyzes Large Language Model (LLM) security vulnerabilities based on data from Crucible, encompassing 214,271 attack attempts by 1,674 users across 30 LLM challenges. Our findings reveal automated approaches significantly outperform manual techniques (69.5% vs 47.6% success rate), despite only 5.2% of users employing automation. We demonstrate that automated approaches excel in systematic exploration and pattern matching challenges, while manual approaches retain speed advantages in certain creative reasoning scenarios, often solving problems 5x faster when successful. Challenge categories requiring systematic exploration are most effectively targeted through automation, while intuitive challenges sometimes favor manual techniques for time-to-solve metrics. These results illuminate how algorithmic testing is transforming AI red-teaming practices, with implications for both offensive security research and defensive measures. Our analysis suggests optimal security testing combines human creativity for strategy development with programmatic execution for thorough exploration.

外部データセット

Crucible LLM security challenges