An Empirical Study on the Intrinsic Privacy of SGD

Authors: Stephanie L. Hyland, Shruti Tople | Published: 2019-12-05 | Updated: 2022-02-28

2019.12.052025.04.03

Authors: Stephanie L. Hyland, Shruti Tople
Published: 2019-12-05 | Updated: 2022-02-28

Source: https://arxiv.org/abs/1912.02919

PDF: https://arxiv.org/pdf/1912.02919

AIにより推定されたラベル

SGDの特性プライバシー保護深層学習手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Introducing noise in the training of machine learning systems is a powerful way to protect individual privacy via differential privacy guarantees, but comes at a cost to utility. This work looks at whether the inherent randomness of stochastic gradient descent (SGD) could contribute to privacy, effectively reducing the amount of additional noise required to achieve a given privacy guarantee. We conduct a large-scale empirical study to examine this question. Training a grid of over 120,000 models across four datasets (tabular and images) on convex and non-convex objectives, we demonstrate that the random seed has a larger impact on model weights than any individual training example. We test the distribution over weights induced by the seed, finding that the simple convex case can be modelled with a multivariate Gaussian posterior, while neural networks exhibit multi-modal and non-Gaussian weight distributions. By casting convex SGD as a Gaussian mechanism, we then estimate an ‘intrinsic’ data-dependent ϵ_i(𝒟), finding values as low as 6.3, dropping to 1.9 using empirical estimates. We use a membership inference attack to estimate ϵ for non-convex SGD and demonstrate that hiding the random seed from the adversary results in a statistically significant reduction in attack performance, corresponding to a reduction in the effective ϵ. These results provide empirical evidence that SGD exhibits appreciable variability relative to its dataset sensitivity, and this ‘intrinsic noise’ has the potential to be leveraged to improve the utility of privacy-preserving machine learning.