Safeguarding AI Agents: Developing and Analyzing Safety Architectures

TOP 文献データベース Safeguarding AI Agents: Developing and Analyzing Safety Architectures

Computing Research Repository (CoRR)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2409.03793

PDF

https://arxiv.org/pdf/2409.03793

文献情報

作者: Ishaan Domkundwar;Mukunda N S;Ishaan Bhola
公開日: 2024-9-3
更新日: 2024-9-13
所属機関: Mahindra International School
所属の国: India
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

安全性アライメントコンテンツモデレーション内部レビューシステム

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

AI agents, specifically powered by large language models, have demonstrated exceptional capabilities in various applications where precision and efficacy are necessary. However, these agents come with inherent risks, including the potential for unsafe or biased actions, vulnerability to adversarial attacks, lack of transparency, and tendency to generate hallucinations. As AI agents become more prevalent in critical sectors of the industry, the implementation of effective safety protocols becomes increasingly important. This paper addresses the critical need for safety measures in AI systems, especially ones that collaborate with human teams. We propose and evaluate three frameworks to enhance safety protocols in AI agent systems: an LLM-powered input-output filter, a safety agent integrated within the system, and a hierarchical delegation-based system with embedded safety checks. Our methodology involves implementing these frameworks and testing them against a set of unsafe agentic use cases, providing a comprehensive evaluation of their effectiveness in mitigating risks associated with AI agent deployment. We conclude that these frameworks can significantly strengthen the safety and security of AI agent systems, minimizing potential harmful actions or outputs. Our work contributes to the ongoing effort to create safe and reliable AI applications, particularly in automated operations, and provides a foundation for developing robust guardrails to ensure the responsible use of AI agents in real-world applications.