These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Differentially private stochastic gradient descent (DP-SGD) is the canonical
approach to private deep learning. While the current privacy analysis of DP-SGD
is known to be tight in some settings, several empirical results suggest that
models trained on common benchmark datasets leak significantly less privacy for
many datapoints. Yet, despite past attempts, a rigorous explanation for why
this is the case has not been reached. Is it because there exist tighter
privacy upper bounds when restricted to these dataset settings, or are our
attacks not strong enough for certain datapoints? In this paper, we provide the
first per-instance (i.e., ``data-dependent") DP analysis of DP-SGD. Our
analysis captures the intuition that points with similar neighbors in the
dataset enjoy better data-dependent privacy than outliers. Formally, this is
done by modifying the per-step privacy analysis of DP-SGD to introduce a
dependence on the distribution of model updates computed from a training
dataset. We further develop a new composition theorem to effectively use this
new per-step analysis to reason about an entire training run. Put all together,
our evaluation shows that this novel DP-SGD analysis allows us to now formally
show that DP-SGD leaks significantly less privacy for many datapoints (when
trained on common benchmarks) than the current data-independent guarantee. This
implies privacy attacks will necessarily fail against many datapoints if the
adversary does not have sufficient control over the possible training datasets.