Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks? | AIセキュリティポータル

EN

JA

EN

TOP 文献データベース Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

arxiv

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2402.09540

PDF

https://arxiv.org/pdf/2402.09540

文献情報

作者: Andrew Lowy;Zhuohang Li;Jing Liu;Toshiaki Koike-Akino;Kieran Parsons;Ye Wang
公開日: 2024-2-15
所属機関: University of Wisconsin-Madison
所属の国: United States of America
会議名

AIにより推定されたラベル

メンバーシップ推論プライバシー保護手法プライバシー保護

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

For small privacy parameter $\epsilon$, $\epsilon$-differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. $\epsilon \geq 7$), and it has been observed empirically that DP with large $\epsilon$ can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of $\epsilon \geq 7$ are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter.

外部データセット

MNIST

X

参考文献

Differential Privacy Overview

Apple

Published: 2016

International Conference on Machine Learning

Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising

Borja Balle, Yu-Xiang Wang

Published: 2018

2013 IEEE 54th Annual Symposium on Foundations of Computer Science

Coupled-worlds privacy: Exploiting adversarial uncertainty in statistical data privacy

Bassily, R., Groce, A., Katz, J., Smith, A.

Published: 2013

Advances in Cryptology–ASIACRYPT 2011

Noiseless database privacy

Bhaskar, R., Bhowmick, A., Goyal, V., Laxman, S., Thakurta, A.

Published: 2011

2022 IEEE Symposium on Security and Privacy (SP)

Membership inference attacks from first principles

Carlini, N., Chien, S., Nasr, M., Song, S., Terzis, A., Tramer, F.

Published: 2022

USENIX Security Symposium

Extracting Training Data from Large Language Models

Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T. B., Song, D., Erlingsson, U.

Published: 2021

A list of real-world uses of differential privacy

Desfontaines, D.

Published: 2021

Advances in Neural Information Processing Systems

Collecting telemetry data privately

Ding, B., Kulkarni, J., Yekhanin, S.

Published: 2017

Theory of Cryptography

Calibrating noise to sensitivity in private data analysis

Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith

Published: 2006

2015 IEEE 56th Annual Symposium on Foundations of Computer Science

Robust traceability from trace amounts

Dwork, C., Smith, A., Steinke, T., Ullman, J., Vadhan, S.

Published: 2015

Algorithms with More Granular Differential Privacy Guarantees

Ghazi, B., Kumar, R., Manurangsi, P., Steinke, T.

Published: 2022

Bounding Training Data Reconstruction in DP-SGD

Hayes, J., Mahloujifar, S., Balle, B.

Published: 2023

PLOS Genetics

Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays

N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, J. Muehling, J. V. Pearson, D. A. Stephan, S. F. Nelson, D. W. Craig

Published: 2008

Differentially private learning does not bound membership inference

Thomas Humphries, Matthew Rafuse, Lindsey Tulloch, Simon Oya, Ian Goldberg, Florian Kerschbaum

Published: 2020

Trans. Mach. Learn. Res.

Provable Membership Inference Privacy

Zachary Izzo, Jinsung Yoon, Sercan O. Arik, James Zou

Published: 2022.11.12

In applications involving sensitive data, such as finance and healthcare, the necessity for preserving data privacy can be a significant barrier to machine learning model development. Differential privacy (DP) has emerged as one canonical standard for provable privacy. However, DP's strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning, and DP guarantees themselves can be difficult to interpret. In this work, we propose a novel privacy notion, membership inference privacy (MIP), to address these challenges. We give a precise characterization of the relationship between MIP and DP, and show that MIP can be achieved using less amount of randomness compared to the amount required for guaranteeing DP, leading to a smaller drop in utility. MIP guarantees are also easily interpretable in terms of the success rate of membership inference attacks. Our theoretical results also give rise to a simple algorithm for guaranteeing MIP which can be used as a wrapper around any algorithm with a continuous output, including parametric model training.

メンバーシップ開示リスクプライバシー保護手法プライバシー評価

SIAM Journal on Computing

What can we learn privately?

Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhodnikova, S., Smith, A.

Published: 2011

Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems

A rigorous and customizable framework for privacy

Kifer, D., Machanavajjhala, A.

Published: 2012

Conference on Neural Information Processing Systems (NeurIPS)

Gaussian Membership Inference Privacy

Tobias Leemann, Martin Pawelczyk, Gjergji Kasneci

Published: 2023.6.13

We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, $f$-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of $f$-MIP guarantees that we refer to as $\mu$-Gaussian Membership Inference Privacy ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how $f$-MIP can be amplified by adding noise to gradient updates. Our analysis further yields an analytical membership inference attack that offers two distinct advantages over previous approaches. First, unlike existing state-of-the-art attacks that require training hundreds of shadow models, our attack does not require any shadow model. Second, our analytical attack enables straightforward auditing of our privacy notion $f$-MIP. Finally, we quantify how various hyperparameters (e.g., batch size, number of model parameters) and specific data characteristics determine an attacker's ability to accurately infer a point's membership in the training set. We demonstrate the effectiveness of our method on models trained on vision and tabular datasets.

プライバシー手法仮説検定統計的検定

Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security

On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy

Li, N., Qardaji, W., Su, D.

Published: 2012

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security

Membership privacy: A unifying framework for privacy definitions

Li, N., Qardaji, W., Su, D., Wu, Y., Yang, W.

Published: 2013

Computing Research Repository (CoRR)

Towards Measuring Membership Privacy

Yunhui Long, Vincent Bindschaedler, Carl A. Gunter

Published: 2017.12.26

Machine learning models are increasingly made available to the masses through public query interfaces. Recent academic work has demonstrated that malicious users who can query such models are able to infer sensitive information about records within the training data. Differential privacy can thwart such attacks, but not all models can be readily trained to achieve this guarantee or to achieve it with acceptable utility loss. As a result, if a model is trained without differential privacy guarantee, little is known or can be said about the privacy risk of releasing it. In this work, we investigate and analyze membership attacks to understand why and how they succeed. Based on this understanding, we propose Differential Training Privacy (DTP), an empirical metric to estimate the privacy risk of publishing a classier when methods such as differential privacy cannot be applied. DTP is a measure of a classier with respect to its training dataset, and we show that calculating DTP is efficient in many practical cases. We empirically validate DTP using state-of-the-art machine learning models such as neural networks trained on real-world datasets. Our results show that DTP is highly predictive of the success of membership attacks and therefore reducing DTP also reduces the privacy risk. We advocate for DTP to be used as part of the decision-making process when considering publishing a classifier. To this end, we also suggest adopting the DTP-1 hypothesis: if a classifier has a DTP value above 1, it should not be published.

プライバシーリスク管理メンバーシップ推論機械学習のプライバシー保護

Optimal membership inference bounds for adaptive composition of sampled gaussian mechanisms

Saeed Mahloujifar, Alexandre Sablayrolles, Graham Cormode, Somesh Jha

Published: 2022

48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07)

Mechanism design via differential privacy

McSherry, F., Talwar, K.

Published: 2007

Annual International Cryptology Conference

Computational differential privacy

Mironov, I., Pandey, O., Reingold, O., Vadhan, S.

Published: 2009

被引用数 10

International Conference on Machine Learning (ICML)

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference

Alexandre Sablayrolles, Matthijs Douze, Yann Ollivier, Cordelia Schmid, Hervé Jégou

Published: 2019.8.29

Membership inference determines, given a sample and trained parameters of a machine learning model, whether the sample was part of the training set. In this paper, we derive the optimal strategy for membership inference with a few assumptions on the distribution of the parameters. We show that optimal attacks only depend on the loss function, and thus black-box attacks are as good as white-box attacks. As the optimal strategy is not tractable, we provide approximations of it leading to several inference methods, and show that existing membership inference methods are coarser approximations of this optimal strategy. Our membership attacks outperform the state of the art in various settings, ranging from a simple logistic regression to more complex architectures and datasets, such as ResNet-101 and Imagenet.

メンバーシップ推論サンプル複雑性難易度キャリブレーション

f-divergence inequalities

Sason, I., Verdu, S.

Published: 2016

Proceedings of the 22nd ACM SIGSAC conference on computer and communications security

Privacy-preserving deep learning

Shokri, R., Shmatikov, V.

Published: 2015

Proceedings of the 21st ACM Conference on Computer and Communications Security

RAP- POR: Randomized Aggregatable Privacy-Preserving Ordinal Response

Ulfar Erlingsson, Pihur, V., Korolova, A.

Published: 2014

被引用数 22

IEEE Computer Security Foundations Symposium (CSF)

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha

Published: 2017.9.6

Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role. This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.

プライバシー漏洩メンバーシップ推論プライバシー分析