Pre-training exploits public datasets to pre-train an advanced machine
learning model, so that the model can be easily tuned to adapt to various
downstream tasks. Pre-training has been extensively explored to mitigate
computation and communication resource consumption. Inspired by these
advantages, we are the first to explore how model pre-training can mitigate
noise detriment in differentially private federated learning (DPFL). DPFL is
upgraded from federated learning (FL), the de-facto standard for privacy
preservation when training the model across multiple clients owning private
data. DPFL introduces differentially private (DP) noises to obfuscate model
gradients exposed in FL, which however can considerably impair model accuracy.
In our work, we compare head fine-tuning (HT) and full fine-tuning (FT), which
are based on pre-training, with scratch training (ST) in DPFL through a
comprehensive empirical study. Our experiments tune pre-trained models
(obtained by pre-training on ImageNet-1K) with CIFAR-10, CHMNIST and
Fashion-MNIST (FMNIST) datasets, respectively. The results demonstrate that HT
and FT can significantly mitigate noise influence by diminishing gradient
exposure times. In particular, HT outperforms FT when the privacy budget is
tight or the model size is large. Visualization and explanation study further
substantiates our findings. Our pioneering study introduces a new perspective
on enhancing DPFL and expanding its practical applications.