These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Recent advances in multi-modal Large Language Models (M-LLMs) have
demonstrated a powerful ability to synthesize implicit information from
disparate sources, including images and text. These resourceful data from
social media also introduce a significant and underexplored privacy risk: the
inference of sensitive personal attributes from seemingly daily media content.
However, the lack of benchmarks and comprehensive evaluations of
state-of-the-art M-LLM capabilities hinders the research of private attribute
profiling on social media. Accordingly, we propose (1) PRISM, the first
multi-modal, multi-dimensional and fine-grained synthesized dataset
incorporating a comprehensive privacy landscape and dynamic user history; (2)
an Efficient evaluation framework that measures the cross-modal privacy
inference capabilities of advanced M-LLM. Specifically, PRISM is a large-scale
synthetic benchmark designed to evaluate cross-modal privacy risks. Its key
feature is 12 sensitive attribute labels across a diverse set of multi-modal
profiles, which enables targeted privacy analysis. These profiles are generated
via a sophisticated LLM agentic workflow, governed by a prior distribution to
ensure they realistically mimic social media users. Additionally, we propose a
Multi-Agent Inference Framework that leverages a pipeline of specialized LLMs
to enhance evaluation capabilities. We evaluate the inference capabilities of
six leading M-LLMs (Qwen, Gemini, GPT-4o, GLM, Doubao, and Grok) on PRISM. The
comparison with human performance reveals that these MLLMs significantly
outperform in accuracy and efficiency, highlighting the threat of potential
privacy risks and the urgent need for robust defenses.