These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Large language models (LLMs) have significantly transformed natural language
understanding and generation, but they raise privacy concerns due to potential
exposure of sensitive information. Studies have highlighted the risk of
information leakage, where adversaries can extract sensitive information
embedded in the prompts. In this work, we introduce a novel private prediction
framework for generating high-quality synthetic text with strong privacy
guarantees. Our approach leverages the Differential Privacy (DP) framework to
ensure worst-case theoretical bounds on information leakage without requiring
any fine-tuning of the underlying models. The proposed method performs
inference on private records and aggregates the resulting per-token output
distributions. This enables the generation of longer and coherent synthetic
text while maintaining privacy guarantees. Additionally, we propose a simple
blending operation that combines private and public inference to further
enhance utility. Empirical evaluations demonstrate that our approach
outperforms previous state-of-the-art methods on in-context-learning (ICL)
tasks, making it a promising direction for privacy-preserving text generation
while maintaining high utility.