On the Query Complexity of Training Data Reconstruction in Private Learning

TOP Literature Database On the Query Complexity of Training Data Reconstruction in Private Learning

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2303.16372

PDF

https://arxiv.org/pdf/2303.16372

Paper Information

Author: Prateeti Mukherjee;Satya Lokam
Published: 3-29-2023
Updated: 1-12-2024
Affiliation: Microsoft Research
Country: United States of America
Conference

Labels Estimated by AI

Privacy Protection Method Privacy Assessment Privacy Analysis

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data. For $(\epsilon, \delta)$ DP learners with training data drawn from any arbitrary compact metric space, we provide the \emph{first known lower bounds on the adversary's query complexity} as a function of the learner's privacy parameters. \emph{Our results are minimax optimal for every $\epsilon \geq 0, \delta \in [0, 1]$, covering both $\epsilon$-DP and $(0, \delta)$ DP as corollaries}. Beyond this, we obtain query complexity lower bounds for $(\alpha, \epsilon)$ R\'enyi DP learners that are valid for any $\alpha > 1, \epsilon \geq 0$. Finally, we analyze data reconstruction attacks on locally compact metric spaces via the framework of Metric DP, a generalization of DP that accounts for the underlying metric structure of the data. In this setting, we provide the first known analysis of data reconstruction in unbounded, high dimensional spaces and obtain query complexity lower bounds that are nearly tight modulo logarithmic factors.

References

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Deep learning with differential privacy

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang

Published: 2016

Mathematical Research Letters

Moment estimates derived from poincare and logarithmic sobolev inequalities

Shigeki Aida, Daniel Stroock

Published: 1994

Differentially private simple linear regression

Daniel Alabi, Audra McMillan, Jayshree Sarathy, Adam Smith, Salil Vadhan

Published: 2020

Tracking dp budget while handling basic sql queries

Joshua Allen, Janardhan Kulkarni, Abhradeep Thakurta, Sergey Yekhanin

Learning with privacy at scale

Differential Privacy Team Apple

Binomial Distribution

Wiki Article

arxiv

Cited by 13

Reconstructing Training Data with Informed Adversaries

Borja Balle, Giovanni Cherubin, Jamie Hayes

Published: 1.13.2022

Given access to a machine learning model, can an adversary reconstruct the model's training data? This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one. By instantiating concrete attacks, we show it is feasible to reconstruct the remaining data point in this stringent threat model. For convex models (e.g. logistic regression), reconstruction attacks are simple and can be derived in closed-form. For more general models (e.g. neural networks), we propose an attack strategy based on training a reconstructor network that receives as input the weights of the model under attack and produces as output the target data point. We demonstrate the effectiveness of our attack on image classifiers trained on MNIST and CIFAR-10, and systematically investigate which factors of standard machine learning pipelines affect reconstruction success. Finally, we theoretically investigate what amount of differential privacy suffices to mitigate reconstruction attacks by informed adversaries. Our work provides an effective reconstruction attack that model developers can use to assess memorization of individual points in general settings beyond those considered in previous works (e.g. generative language models or access to training gradients); it shows that standard models have the capacity to store enough information to enable high-fidelity reconstruction of training data points; and it demonstrates that differential privacy can successfully mitigate such attacks in a parameter regime where utility degradation is minimal.

Reconstruction Attack Poisoning Data Selection Strategy

Probability Theory and Related Fields

Poincare’s inequalities and talagrand’s concentration phenomenon for the exponential distribution

Sergey Bobkov, Michel Ledoux

Published: 1997

Geometric Aspects of Functional Analysis: Israel Seminar 2001-2002

Spectral gap and concentration for some spherically symmetric probability measures

Sergey G Bobkov

Published: 2003

Private measures, random walks, and synthetic data

March Boedihardjo, Thomas Strohmer, Roman Vershynin

Published: 2022

A corrective view of neural networks: Representation, memorization and learning

Guy Bresler, Dheeraj Nagaraj

Published: 2020

OpenAI Technical Report

Language models are few-shot learners

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei

Published: 2020

Advances in Neural Information Processing Systems

Network size and size of the weights in memorization with two-layers neural networks

Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

Published: 2020

Theory of Cryptography Conference

Concentrated differential privacy: Simplifications, extensions, and lower bounds

Mark Bun, Thomas Steinke

Published: 2016

Asymptotic methods in statistical decision theory

Lucien Le Cam

Published: 1986

Extracting training data from large language models

Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel

Published: 2021

Proc. of PETS

Broadening the scope of differential privacy using metrics

K. Chatzikokolakis, M. E. Andres, N. E. Bordenabe, C. Palamidessi

Published: 2013

Advances in Neural Information Processing Systems

Privacy-preserving logistic regression

Kamalika Chaudhuri, Claire Monteleoni

Published: 2008

Differentially private empirical risk minimization

Kamalika Chaudhuri, Claire Monteleoni, Anand D. Sarwate

Published: 2009

Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems

Revealing information while preserving privacy

Irit Dinur, Kobbi Nissim

Foundations and Trends in Theoretical Computer Science

The Algorithmic Foundations of Differential Privacy

Cynthia Dwork, Aaron Roth

Published: 2014

Concentrated differential privacy

Cynthia Dwork, Guy N. Rothblum

Published: 2016

Advances in Cryptology - EUROCRYPT 2006

Our data, ourselves: Privacy via distributed noise generation

C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, M. Naor

Published: 2006

Privacy amplification by iteration

Vitaly Feldman, Ilya Mironov, Kunal Talwar, Abhradeep Thakurta

Published: 2018

arxiv

Cited by 1

International Conference on Principles of Security and Trust (POST)

Generalised Differential Privacy for Text Document Processing

Natasha Fernandes, Mark Dras, Annabelle McIver

Published: 11.26.2018

We address the problem of how to "obfuscate" texts by removing stylistic clues which can identify authorship, whilst preserving (as much as possible) the content of the text. In this paper we combine ideas from "generalised differential privacy" and machine learning techniques for text processing to model privacy for text documents. We define a privacy mechanism that operates at the level of text documents represented as "bags-of-words" - these representations are typical in machine learning and contain sufficient information to carry out many kinds of classification tasks including topic identification and authorship attribution (of the original documents). We show that our mechanism satisfies privacy with respect to a metric for semantic similarity, thereby providing a balance between utility, defined by the semantic content of texts, with the obfuscation of stylistic clues. We demonstrate our implementation on a "fan fiction" dataset, confirming that it is indeed possible to disguise writing style effectively whilst preserving enough information and variation for accurate content classification tasks.

Differential Privacy Model Inversion Application of Text Classification

The laplace mechanism has optimal utility for differential privacy over continuous queries

Natasha Fernandes, Annabelle McIver, Carroll Morgan

Published: 2021

Sa-dpsgd: Differentially private stochastic gradient descent based on simulated annealing

Jie Fu, Zhili Chen, XinPeng Ling

Published: 2022

Scalar poincare implies matrix poincaré

Ankit Garg, Tarun Kathuria, Nikhil Srivastava

Published: 2020

Journal of the ACM (JACM)

Property testing and its connection to learning and approximation

Oded Goldreich, Shari Goldwasser, Dana Ron

Published: 1998

Proceedings of the 39th International Conference on Machine Learning

Bounding training data reconstruction in private (deep) learning

Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten

Published: 2022

From poincare inequalities to nonlinear matrix concentration

De Huang, Joel A Tropp

Published: 2021

ArXiv

Differentially private learning does not bound membership inference

Thomas Humphries, Matthew Rafuse, Lindsey Tulloch, Simon Oya, Ian Goldberg, Florian Kerschbaum

Published: 2020

UAI 2022

Balancing utility and scalability in metric differential privacy

Jacob Imola, Shiva Kasiviswanathan, Stephen White, Abhinav Aggarwal, Nathanael Teissier

Published: 2022

Differentially private online learning

Prateek Jain, Pravesh Kothari, Abhradeep Thakurta

Published: 2011

International conference on machine learning

The composition theorem for differential privacy

Peter Kairouz, Sewoong Oh, Pramod Viswanath

Published: 2015

MIT Press

An introduction to computational learning theory

Michael J Kearns, Umesh Vazirani

Published: 1994

Journal of Privacy and Confidentiality

Gradual release of sensitive data under differential privacy

Fragkiskos Koufogiannis, Shuo Han, George J. Pappas

Published: 2017

Proceedings of the IEEE

Gradient-based learning applied to document recognition

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner

Published: 1998

American Mathematical Society

Markov chains and mixing times

David A Levin, Yuval Peres

Published: 2017

Optimal membership inference bounds for adaptive composition of sampled gaussian mechanisms

Saeed Mahloujifar, Alexandre Sablayrolles, Graham Cormode, Somesh Jha

Published: 2022

48th Annual IEEE Symposium on Foundations of Computer Science

Mechanism design via differential privacy

Frank McSherry, Kunal Talwar

Published: 2007

Renyi differential privacy

Ilya Mironov

Published: 2017

Theory of Cryptography Conference

The complexity of computing the optimal composition of differential privacy

Jack Murtagh, Salil Vadhan

Published: 2015

International Conference on Machine Learning

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

Published: 2021

McGraw-hill

Principles of mathematical analysis

Walter Rudin

Published: 1976

arxiv

Cited by 3

SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning

Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, Santiago Zanella-Béguelin

Published: 12.21.2022

Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. There is a vast literature analyzing different types of inference risks, ranging from membership inference to reconstruction attacks. Inspired by the success of games (i.e., probabilistic experiments) to study security properties in cryptography, some authors describe privacy inference risks in machine learning using a similar game-based style. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the other, which makes it hard to relate and compose results. In this paper, we present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning. We use this framework to (1) provide a unifying structure for definitions of inference risks, (2) formally establish known relations among definitions, and (3) to uncover hitherto unknown relations that would have been difficult to spot otherwise.

Membership Inference Privacy Enhancing Technology Data Privacy Assessment

USENIX Association

Auditing data privacy for machine learning

Reza Shokri

Published: 2022

IEEE Global Conference on Signal and Information Processing

Stochastic gradient descent with differentially private updates

S. Song, K. Chaudhuri, A. D. Sarwate

Published: 2013

arxiv

Cited by 1

Defending against Reconstruction Attacks with Rényi Differential Privacy

Pierre Stock, Igor Shilov, Ilya Mironov, Alexandre Sablayrolles

Published: 2.16.2022

Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model. It has been recently shown that simple heuristics can reconstruct data samples from language models, making this threat scenario an important aspect of model release. Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget (epsilon > 8) which does not translate to meaningful guarantees. In this paper we show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature. In particular, we show that larger privacy budgets do not protect against membership inference, but can still protect extraction of rare secrets. We show experimentally that our guarantees hold against various language models, including GPT-2 finetuned on Wikitext-103.

Privacy Risk Management Membership Inference Membership Disclosure Risk

Differentially private learning needs better features (or much more data)

F. Tramer, D. Boneh

Published: 2020

Introduction to nonparametric estimation

Alexandre B Tsybakov

Published: 2009

Springer

The Complexity of Differential Privacy

Salil Vadhan

Published: 2017

arXiv

On the optimal memorization power of relu neural networks

Gal Vardi, Gilad Yehudai, Ohad Shamir

Published: 2021

Theory of games and economic behavior

John Von Neumann, Oskar Morgenstern

Published: 1947

Cambridge university press

High-dimensional statistics: A non-asymptotic viewpoint.

Wainwright, M.J.

Published: 2019

Journal of the American Statistical Association

Randomized response: a survey technique for eliminating evasive answer bias

Warner, S. L.

Published: 1965

arxiv

Cited by 22

IEEE Computer Security Foundations Symposium (CSF)

Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting

Samuel Yeom, Irene Giacomelli, Matt Fredrikson, Somesh Jha

Published: 9.6.2017

Machine learning algorithms, when applied to sensitive data, pose a distinct threat to privacy. A growing body of prior work demonstrates that models produced by these algorithms may leak specific private information in the training data to an attacker, either through the models' structure or their observable behavior. However, the underlying cause of this privacy risk is not well understood beyond a handful of anecdotal accounts that suggest overfitting and influence might play a role. This paper examines the effect that overfitting and influence have on the ability of an attacker to learn information about the training data from machine learning models, either through training set membership inference or attribute inference attacks. Using both formal and empirical analyses, we illustrate a clear relationship between these factors and the privacy risk that arises in several popular machine learning algorithms. We find that overfitting is sufficient to allow an attacker to perform membership inference and, when the target attribute meets certain conditions about its influence, attribute inference attacks. Interestingly, our formal analysis also shows that overfitting is not necessary for these attacks and begins to shed light on what other factors may be in play. Finally, we explore the connection between membership inference and attribute inference, showing that there are deep connections between the two that lead to effective new attacks.

Privacy Leakage Membership Inference Privacy Analysis

arxiv

Cited by 1

Federated Learning

Deep Leakage from Gradients

Ligeng Zhu, Zhijian Liu, Song Han

Published: 6.21.2019

Exchanging gradients is a widely used method in modern multi-node machine learning system (e.g., distributed training, collaborative learning). For a long time, people believed that gradients are safe to share: i.e., the training data will not be leaked by gradient exchange. However, we show that it is possible to obtain the private training data from the publicly shared gradients. We name this leakage as Deep Leakage from Gradient and empirically validate the effectiveness on both computer vision and natural language processing tasks. Experimental results show that our attack is much stronger than previous approaches: the recovery is pixel-wise accurate for images and token-wise matching for texts. We want to raise people's awareness to rethink the gradient's safety. Finally, we discuss several possible strategies to prevent such deep leakage. The most effective defense method is gradient pruning.

Privacy Protection Defensive Deception Adversarial attack