OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents

arxiv

OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2606.12341

PDF

https://arxiv.org/pdf/2606.12341

文献情報

作者: Jin Xie,Songze Li
公開日: 2026-6-11
所属機関
所属の国
会議名

AIにより推定されたラベル

プロンプトインジェクションプライバシー保護技術データリークやモデルの問題に関する分析を反映した新規ラベル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Large language model (LLM) agents increasingly act on a user's behalf -- reading personal files, calling tools, transacting with external services -- possibly leaking personally identifiable information (PII) across trust boundaries at every step. Privacy here is a property not of a single output but of an entire trajectory, and three properties make it hard: leakage is cumulative, as individually innocuous releases accumulate across honest-but-curious or colluding sinks into inferences about a protected secret; bidirectional, as a malicious observation can inject instructions that turn the agent's own reasoning model against the user; and task-dependent, as the same field is necessary for one recipient yet gratuitous for another. Per-release contextual-integrity filters, information-flow controls, and posterior-leakage monitors each address part of this but none controls cumulative, inference-based leakage at runtime. We recast agent privacy as \emph{posterior-risk control} and present OCELOT, a runtime mediator that budgets how much an adversary's belief about a secret may improve across a trajectory, rather than filtering outputs. Its mechanism, \emph{Witness-Verified Declassification}, separates judgment from trust: an untrusted, locally fine-tuned defender model inspects each candidate release and emits structured evidence -- labeled atoms and proposed declassification operators -- which a deterministic verifier audits, charging a certified min-entropy cost for the chosen variant and authorizing the least-disclosing useful release under a sink-trust-weighted budget recorded on a tamper-evident ledger. Across diverse agent benchmarks and recent defenses, OCELOT attains significantly lower leakage at higher task utility, resists adaptive injection, jailbreak, cumulative inference, and sink collusion, and adds only modest overhead.

外部データセット

OCELOT-Traj-500

参考文献

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, Y. Cao

anthropic Engineering blog

Effective Harnesses for Long-Running Agents

Published: 2025

Harness Engineering: Leveraging Codex in an Agent-First World

Published: 2026

Natural-language agent harnesses

Linyue Pan, Lexiao Zou, Shuo Guo, Jingchen Ni, Hai-Tao Zheng

Published: 2026

30th USENIX Security Symposium (USENIX Security 21)

Extracting training data from large language models

Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, Colin Raffel

Published: 2021

Beyond memorization: Violating privacy via inference with large language models

R. Staab, M. Vero, M. Balunovic, M. Vechev

Published: 2023

Agentleak: A full-stack benchmark for privacy leakage in multi-agent llm systems

F. E. Yagoubi, G. Badu-Marfo, R. A. Mallah

Published: 2026

Advances in Neural Information Processing Systems

Agentdojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer, F. Tramèr

Published: 2024

Can llms keep a secret? testing privacy implications of language models via contextual integrity theory

N. Mireshghallah, H. Kim, X. Zhou, Y. Tsvetkov, M. Sap, R. Shokri, Y. Choi

Published: 2024

Washington Law Review

Privacy as contextual integrity

Helen Nissenbaum

Published: 2004

Flexguard: Continuous risk scoring for strictness-adaptive llm content moderation

Z. Ding, J. Li, Z. Lu, J. Shi

Published: 2026

Privacy in action: Towards realistic privacy mitigation and evaluation for llm-powered agents

S. Wang, F. Yu, X. Liu, X. Qin, J. Zhang, Q. Lin, D. Zhang, S. Rajmohan

Published: 2025

Computing Research Repository (CoRR)

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, Florian Tramèr

Published: 2025.3.25

Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models are susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows by enforcing security policies when tools are called. We demonstrate effectiveness of CaMeL by solving $77\%$ of tasks with provable security (compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL at https://github.com/google-research/camel-prompt-injection.

インダイレクトプロンプトインジェクションプロンプトインジェクション

Rtbas: Defending llm agents against prompt injection and privacy leakage

Peter Yong Zhong, Siyuan Chen, Ruiqi Wang, McKenna McCall, Ben L. Titzer, Heather Miller, Phillip B. Gibbons

Published: 2025

The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track

AgentDAM: Privacy leakage evaluation for autonomous web agents

A. Zharmagambetov, C. Guo, I. Evtimov, M. Pavlova, R. Salakhutdinov, K. Chaudhuri

Published: 2026

Information-theoretic privacy control for sequential multi-agent llm systems

S. Asif, M. M. Amiri

Published: 2026

International Conference on Foundations of Software Science and Computational Structures

On the foundations of quantitative information flow

Geoffrey Smith

Published: 2009

Proceedings of the 14th ACM conference on Computer and communications security

An information-theoretic model for adaptive side-channel attacks

B. Kopf, D. Basin

Published: 2007

Foundations and Trends in Theoretical Computer Science

The Algorithmic Foundations of Differential Privacy

Cynthia Dwork, Aaron Roth

Published: 2014

Cambridge university press

Modeling and reasoning with Bayesian networks

A. Darwiche

Published: 2009

Journal of Artificial Intelligence Research

Probabilistic inference in credal networks: new complexity results

D. D. Maua, C. P. de Campos, A. Benavoli, A. Antonucci

Published: 2014

Proceedings of the 8th international workshop on satisfiability modulo theories

The smt-lib standard: Version 2.0

C. Barrett, A. Stump, C. Tinelli

Published: 2010

International conference on Tools and Algorithms for the Construction and Analysis of Systems

Z3: An efficient SMT solver.

Leonardo De Moura, Nikolaj Bjørner

Published: 2008

Constrained decoding for fill-in-the-middle code language models via efficient left and right quotienting of context-sensitive grammars

D. Melcer, N. Fulton, S. K. Gouda, H. Qian

Published: 2024

Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo

Published: 2024

Handbook of model checking

Satisfiability modulo theories

C. Barrett, C. Tinelli

Published: 2018

The llama 3 herd of models

Published: 2024

Qwen3.5: Accelerating productivity with native multimodal agents

Published: 2026

The minimax-m2 series: Mini activations unleashing max real-world intelligence

A. Chen, A. Li, B. Zhou, B. Gong, B. Jiang, B. Dan, C. Yu, C. Wang, C. Ma, C. Zhong

Published: 2026

Deepseek-v4: Towards highly efficient million-token context intelligence

DeepSeek-AI

Published: 2026

ACM Transactions on Programming Languages and Systems

Simplification by cooperating decision procedures

G. Nelson, D. C. Oppen

Published: 1979

Journal of Cryptographic Engineering

High-speed high-security signatures

Daniel J. Bernstein, Niels Duif, Tanja Lange, Peter Schwabe, Bo-Yin Yang

Published: 2012

Fips pub

Secure hash standard (shs)

F. Pub

Published: 2012