These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
For small privacy parameter $\epsilon$, $\epsilon$-differential privacy (DP)
provides a strong worst-case guarantee that no membership inference attack
(MIA) can succeed at determining whether a person's data was used to train a
machine learning model. The guarantee of DP is worst-case because: a) it holds
even if the attacker already knows the records of all but one person in the
data set; and b) it holds uniformly over all data sets. In practical
applications, such a worst-case guarantee may be overkill: practical attackers
may lack exact knowledge of (nearly all of) the private data, and our data set
might be easier to defend, in some sense, than the worst-case data set. Such
considerations have motivated the industrial deployment of DP models with large
privacy parameter (e.g. $\epsilon \geq 7$), and it has been observed
empirically that DP with large $\epsilon$ can successfully defend against
state-of-the-art MIAs. Existing DP theory cannot explain these empirical
findings: e.g., the theoretical privacy guarantees of $\epsilon \geq 7$ are
essentially vacuous. In this paper, we aim to close this gap between theory and
practice and understand why a large DP parameter can prevent practical MIAs. To
tackle this problem, we propose a new privacy notion called practical
membership privacy (PMP). PMP models a practical attacker's uncertainty about
the contents of the private data. The PMP parameter has a natural
interpretation in terms of the success rate of a practical MIA on a given data
set. We quantitatively analyze the PMP parameter of two fundamental DP
mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis
reveals that a large DP parameter often translates into a much smaller PMP
parameter, which guarantees strong privacy against practical MIAs. Using our
findings, we offer principled guidance for practitioners in choosing the DP
parameter.