These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The success of large language models (LLMs) facilitate many parties to
fine-tune LLMs on their own private data. However, this practice raises privacy
concerns due to the memorization of LLMs. Existing solutions, such as utilizing
synthetic data for substitution, struggle to simultaneously improve performance
and preserve privacy. They either rely on a local model for generation,
resulting in a performance decline, or take advantage of APIs, directly
exposing the data to API servers. To address this issue, we propose
KnowledgeSG, a novel client-server framework which enhances synthetic data
quality and improves model performance while ensuring privacy. We achieve this
by learning local knowledge from the private data with differential privacy
(DP) and distilling professional knowledge from the server. Additionally,
inspired by federated learning, we transmit models rather than data between the
client and server to prevent privacy leakage. Extensive experiments in medical
and financial domains demonstrate the effectiveness of KnowledgeSG. Our code is
now publicly available at https://github.com/wwh0411/KnowledgeSG.