These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With the rapid adoption of Federated Learning (FL) as the training and tuning
protocol for applications utilizing Large Language Models (LLMs), recent
research highlights the need for significant modifications to FL to accommodate
the large-scale of LLMs. While substantial adjustments to the protocol have
been introduced as a response, comprehensive privacy analysis for the adapted
FL protocol is currently lacking.
To address this gap, our work delves into an extensive examination of the
privacy analysis of FL when used for training LLMs, both from theoretical and
practical perspectives. In particular, we design two active membership
inference attacks with guaranteed theoretical success rates to assess the
privacy leakages of various adapted FL configurations. Our theoretical findings
are translated into practical attacks, revealing substantial privacy
vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and
OpenAI's GPTs, across multiple real-world language datasets. Additionally, we
conduct thorough experiments to evaluate the privacy leakage of these models
when data is protected by state-of-the-art differential privacy (DP)
mechanisms.