Fine-tuning large language models (LLMs) with local data is a widely adopted
approach for organizations seeking to adapt LLMs to their specific domains.
Given the shared characteristics in data across different organizations, the
idea of collaboratively fine-tuning an LLM using data from multiple sources
presents an appealing opportunity. However, organizations are often reluctant
to share local data, making centralized fine-tuning impractical. Federated
learning (FL), a privacy-preserving framework, enables clients to retain local
data while sharing only model parameters for collaborative training, offering a
potential solution. While fine-tuning LLMs on centralized datasets risks data
leakage through next-token prediction, the iterative aggregation process in FL
results in a global model that encapsulates generalized knowledge, which some
believe protects client privacy. In this paper, however, we present
contradictory findings through extensive experiments. We show that attackers
can still extract training data from the global model, even using
straightforward generation methods, with leakage increasing as the model size
grows. Moreover, we introduce an enhanced attack strategy tailored to FL, which
tracks global model updates during training to intensify privacy leakage. To
mitigate these risks, we evaluate privacy-preserving techniques in FL,
including differential privacy, regularization-constrained updates and adopting
LLMs with safety alignment. Our results provide valuable insights and practical
guidelines for reducing privacy risks when training LLMs with FL.