Labels Predicted by AI
Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.
Abstract
Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4 Wide-ResNet, improving over the previous SOTA of 71.7 pre-trained NFNet-F3, we achieve a remarkable 83.8 under (0.5, 8*10^-7)-DP. Additionally, we also achieve 86.7 under (8, 8 ^-7)-DP, which is just 4.3 SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.