AIセキュリティポータル K Program
OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents
Share
Abstract
Large language model (LLM) agents increasingly act on a user's behalf -- reading personal files, calling tools, transacting with external services -- possibly leaking personally identifiable information (PII) across trust boundaries at every step. Privacy here is a property not of a single output but of an entire trajectory, and three properties make it hard: leakage is cumulative, as individually innocuous releases accumulate across honest-but-curious or colluding sinks into inferences about a protected secret; bidirectional, as a malicious observation can inject instructions that turn the agent's own reasoning model against the user; and task-dependent, as the same field is necessary for one recipient yet gratuitous for another. Per-release contextual-integrity filters, information-flow controls, and posterior-leakage monitors each address part of this but none controls cumulative, inference-based leakage at runtime. We recast agent privacy as \emph{posterior-risk control} and present OCELOT, a runtime mediator that budgets how much an adversary's belief about a secret may improve across a trajectory, rather than filtering outputs. Its mechanism, \emph{Witness-Verified Declassification}, separates judgment from trust: an untrusted, locally fine-tuned defender model inspects each candidate release and emits structured evidence -- labeled atoms and proposed declassification operators -- which a deterministic verifier audits, charging a certified min-entropy cost for the chosen variant and authorizing the least-disclosing useful release under a sink-trust-weighted budget recorded on a tamper-evident ledger. Across diverse agent benchmarks and recent defenses, OCELOT attains significantly lower leakage at higher task utility, resists adaptive injection, jailbreak, cumulative inference, and sink collusion, and adds only modest overhead.
Agentdojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr
Published: 2024
Privacy as contextual integrity
Helen Nissenbaum
Published: 2004
On the foundations of quantitative information flow
Geoffrey Smith
Published: 2009
An information-theoretic model for adaptive side-channel attacks
B. Kopf, D. Basin
Published: 2007
Modeling and reasoning with Bayesian networks
A. Darwiche
Published: 2009
Probabilistic inference in credal networks: new complexity results
D. D. Maua, C. P. de Campos, A. Benavoli, A. Antonucci
Published: 2014
The smt-lib standard: Version 2.0
C. Barrett, A. Stump, C. Tinelli
Published: 2010
Z3: An efficient SMT solver.
Leonardo De Moura, Nikolaj Bjørner
Published: 2008
Deepseekmath: Pushing the limits of mathematical reasoning in open language models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo
Published: 2024
Satisfiability modulo theories
C. Barrett, C. Tinelli
Published: 2018
Deepseek-v4: Towards highly efficient million-token context intelligence
DeepSeek-AI
Published: 2026
Simplification by cooperating decision procedures
G. Nelson, D. C. Oppen
Published: 1979
High-speed high-security signatures
Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, Bo-Yin Yang
Published: 2012
Secure hash standard (shs)
F. Pub
Published: 2012
Share