Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

TOP Literature Database Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2510.02356

PDF

https://arxiv.org/pdf/2510.02356

Paper Information

Author: Xinjie Shen,Mufei Li,Pan Li
Published: 9-28-2025
Updated: 10-14-2025
Affiliation: Georgia Tech
Country: United States of America
Conference: Computing Research Repository (CoRR)

Labels Estimated by AI

Privacy Enhancing Technology Hallucination 倫理的選択評価(Fail to translate)

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The deployment of Large Language Models (LLMs) in embodied agents creates an urgent need to measure their privacy awareness in the physical world. Existing evaluation methods, however, are confined to natural language based scenarios. To bridge this gap, we introduce EAPrivacy, a comprehensive evaluation benchmark designed to quantify the physical-world privacy awareness of LLM-powered agents. EAPrivacy utilizes procedurally generated scenarios across four tiers to test an agent's ability to handle sensitive objects, adapt to changing environments, balance task execution with privacy constraints, and resolve conflicts with social norms. Our measurements reveal a critical deficit in current models. The top-performing model, Gemini 2.5 Pro, achieved only 59\% accuracy in scenarios involving changing physical environments. Furthermore, when a task was accompanied by a privacy request, models prioritized completion over the constraint in up to 86\% of cases. In high-stakes situations pitting privacy against critical social norms, leading models like GPT-4o and Claude-3.5-haiku disregarded the social norm over 15\% of the time. These findings, demonstrated by our benchmark, underscore a fundamental misalignment in LLMs regarding physically grounded privacy and establish the need for more robust, physically-aware alignment. Codes and datasets will be available at https://github.com/Graph-COM/EAPrivacy.

External Datasets

EAPrivacy benchmark