Reconstruction of Differentially Private Text Sanitization via Large Language Models

TOP 文献データベース Reconstruction of Differentially Private Text Sanitization via Large Language Models

International Symposium on Recent Advances in Intrusion Detection (RAID)

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2410.12443

PDF

https://arxiv.org/pdf/2410.12443

文献情報

作者: Shuchao Pang,Zhigang Lu,Haichen Wang,Peng Fu,Yongbin Zhou,Minhui Xue
公開日: 2025-9-20
所属機関: Nanjing University of Science and Technology
所属の国: China
会議名: International Symposium on Recent Advances in Intrusion Detection (RAID)

AIにより推定されたラベル

プライバシー分析プロンプトリーキングプロンプトインジェクション

Abstract

Differential privacy (DP) is the de facto privacy standard against privacy leakage attacks, including many recently discovered ones against large language models (LLMs). However, we discovered that LLMs could reconstruct the altered/removed privacy from given DP-sanitized prompts. We propose two attacks (black-box and white-box) based on the accessibility to LLMs and show that LLMs could connect the pair of DP-sanitized text and the corresponding private training data of LLMs by giving sample text pairs as instructions (in the black-box attacks) or fine-tuning data (in the white-box attacks). To illustrate our findings, we conduct comprehensive experiments on modern LLMs (e.g., LLaMA-2, LLaMA-3, ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, Claude-3, Claude-3.5, OPT, GPT-Neo, GPT-J, Gemma-2, and Pythia) using commonly used datasets (such as WikiMIA, Pile-CC, and Pile-Wiki) against both word-level and sentence-level DP. The experimental results show promising recovery rates, e.g., the black-box attacks against the word-level DP over WikiMIA dataset gave 72.18% on LLaMA-2 (70B), 82.39% on LLaMA-3 (70B), 75.35% on Gemma-2, 91.2% on ChatGPT-4o, and 94.01% on Claude-3.5 (Sonnet). More urgently, this study indicates that these well-known LLMs have emerged as a new security risk for existing DP text sanitization approaches in the current environment.