Whispers in the Machine: Confidentiality in Agentic Systems

Authors: Jonathan Evertz, Merlin Chlosta, Lea Schönherr, Thorsten Eisenhofer | Published: 2024-02-10 | Updated: 2025-08-12

2024.02.10

Authors: Jonathan Evertz, Merlin Chlosta, Lea Schönherr, Thorsten Eisenhofer
Published: 2024-02-10 | Updated: 2025-08-12

Source: https://arxiv.org/abs/2402.06922

PDF: https://arxiv.org/pdf/2402.06922

AIにより推定されたラベル

攻撃戦略分析プロンプトインジェクションセキュリティ保証

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

The interaction between users and applications is increasingly shifted toward natural language by deploying Large Language Models (LLMs) as the core interface. The capabilities of these so-called agents become more capable the more tools and services they serve as an interface for, ultimately leading to agentic systems. Agentic systems use LLM-based agents as interfaces for most user interactions and various integrations with external tools and services. While these interfaces can significantly enhance the capabilities of the agentic system, they also introduce a new attack surface. Manipulated integrations, for example, can exploit the internal LLM and compromise sensitive data accessed through other interfaces. While previous work primarily focused on attacks targeting a model’s alignment or the leakage of training data, the security of data that is only available during inference has escaped scrutiny so far. In this work, we demonstrate how the integration of LLMs into systems with external tool integration poses a risk similar to established prompt-based attacks, able to compromise the confidentiality of the entire system. Introducing a systematic approach to evaluate these confidentiality risks, we identify two specific attack scenarios unique to these agentic systems and formalize these into a tool-robustness framework designed to measure a model’s ability to protect sensitive information. Our analysis reveals significant vulnerabilities across all tested models, highlighting an increased risk when models are combined with external tools.