These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Retrieval-augmented generation (RAG) systems enhance large language models
(LLMs) by integrating external knowledge bases, but this advancement introduces
significant privacy risks. Existing privacy attacks on RAG systems can trigger
data leakage but often fail to accurately isolate knowledge-base-derived
sentences within mixed responses. They also lack robustness when applied across
multiple domains. This paper addresses these challenges by presenting a novel
black-box attack framework that exploits knowledge asymmetry between RAG and
standard LLMs to achieve fine-grained privacy extraction across heterogeneous
knowledge landscapes. We propose a chain-of-thought reasoning strategy that
creates adaptive prompts to steer RAG systems away from sensitive content.
Specifically, we first decompose adversarial queries to maximize information
disparity and then apply a semantic relationship scoring to resolve lexical and
syntactic ambiguities. We finally train a neural network on these feature
scores to precisely identify sentences containing private information. Unlike
prior work, our framework generalizes to unseen domains through iterative
refinement without pre-defined knowledge. Experimental results show that we
achieve over 91% privacy extraction rate in single-domain and 83% in
multi-domain scenarios, reducing sensitive sentence exposure by over 65% in
case studies. This work bridges the gap between attack and defense in RAG
systems, enabling precise extraction of private information while providing a
foundation for adaptive mitigation.