Closed-Form Bounds for DP-SGD against Record-level Inference

23rd ACM SIGSAC Conference on Computer and Communications Security, CCS 2016

Deep learning with differential privacy

Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang

Published: 2016

International Conference on Machine Learning

Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising

Borja Balle, Yu-Xiang Wang

Published: 2018

arxiv

被引用数 1

Computing Research Repository (CoRR)

On the Importance of Architecture and Feature Selection in Differentially Private Machine Learning

Wenxuan Bao, Luke A. Bauer, Vincent Bindschaedler

Published: 2022.5.14

We study a pitfall in the typical workflow for differentially private machine learning. The use of differentially private learning algorithms in a "drop-in" fashion -- without accounting for the impact of differential privacy (DP) noise when choosing what feature engineering operations to use, what features to select, or what neural network architecture to use -- yields overly complex and poorly performing models. In other words, by anticipating the impact of DP noise, a simpler and more accurate alternative model could have been trained for the same privacy guarantee. We systematically study this phenomenon through theory and experiments. On the theory front, we provide an explanatory framework and prove that the phenomenon arises naturally from the addition of noise to satisfy differential privacy. On the experimental front, we demonstrate how the phenomenon manifests in practice using various datasets, types of models, tasks, and neural network architectures. We also analyze the factors that contribute to the problem and distill our experimental insights into concrete takeaways that practitioners can follow when training models with differential privacy. Finally, we propose privacy-aware algorithms for feature selection and neural network architecture search. We analyze their differential privacy properties and evaluate them empirically.

モデル選択プライバシー評価パフォーマンス評価

Sov. Math., Dokl

Estimates of the proximity of Gaussian measures

SS Barsov, Vladimir V Ul’yanov

Published: 1987

2022 IEEE Symposium on Security and Privacy (SP)

Membership inference attacks from first principles

Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

Published: 2022

2023 IEEE 36th Computer Security Foundations Symposium (CSF)

Bayes security: A not so average metric

K. Chatzikokolakis, G. Cherubin, C. Palamidessi, C. Troncoso

Published: 2023

arxiv

被引用数 1

Proc. Priv. Enhancing Technol.

Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses

Giovanni Cherubin

Published: 2017.2.25

Website Fingerprinting (WF) attacks raise major concerns about users' privacy. They employ Machine Learning (ML) to allow a local passive adversary to uncover the Web browsing behavior of a user, even if she browses through an encrypted tunnel (e.g. Tor, VPN). Numerous defenses have been proposed in the past; however, it is typically difficult to have formal guarantees on their security, which is most often evaluated empirically against state-of-the-art attacks. In this paper, we present a practical method to derive security bounds for any WF defense, which depend on a chosen feature set. This result derives from reducing WF attacks to an ML classification task, where we can determine the smallest achievable error (the Bayes error); such error can be estimated in practice, and is a lower bound for a WF adversary, for any classification algorithm he may use. Our work has two main consequences: i) it allows determining the security of WF defenses, in a black-box manner, with respect to the state-of-the-art feature set and ii) it favors shifting the focus of future WF research to the identification of optimal feature sets. The generality of the approach further suggests that the method could be used to define security bounds for other ML-based attacks.

プライバシー保護メカニズムバックドア攻撃ウェブサイトフィンガープリンティング

Royal Holloway, University of London

Black-box security: measuring black-box information leakage via machine learning

Giovanni Cherubin

Published: 2019

2019 IEEE Symposium on Security and Privacy (SP)

F-BLEAU: fast black-box leakage estimation

Giovanni Cherubin, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Published: 2019

Technometrics

Characterizations of an empirical influence function for detecting influential cases in regression

R. D. Cook, S. Weisberg

Published: 1980

John Wiley & Sons

Elements of information theory

Thomas M Cover

Published: 1999

The total variation distance between high-dimensional Gaussians with the same mean

Luc Devroye, Abbas Mehrabian, Tommy Reddad

Published: 2018

Gaussian differential privacy

Jinshuo Dong, Aaron Roth, Weijie J. Su

Published: 2019

Proceedings on Privacy Enhancing Technologies

Connect the dots: Tighter discrete approximations of privacy loss distributions

Vadym Doroshenko, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Published: 2022

Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Model inversion attacks that exploit confidence information and basic countermeasures

Matt Fredrikson, Somesh Jha, Thomas Ristenpart

Published: 2015

Advances in Neural Information Processing Systems

Numerical composition of differential privacy

S. Gopi, Y. T. Lee, L. Wutschitz

Published: 2021

Journal of the american statistical association

The influence curve and its role in robust estimation

Frank R Hampel

Published: 1974

2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07

Approximating the kullback leibler divergence between gaussian mixture models

John R Hershey, Peder A Olsen

Published: 2007

ArXiv

Differentially private learning does not bound membership inference

Thomas Humphries, Matthew Rafuse, Lindsey Tulloch, Simon Oya, Ian Goldberg, Florian Kerschbaum

Published: 2020

Journal of Algorithms and Computation

Efficient approximation algorithms for point-set diameter in higher dimensions

Mahdi Imanparast, Seyed Naser Hashemi, Ali Mohades

Published: 2019

Association for Computing Machinery

No free lunch in data privacy

Published: 2011

arxiv

被引用数 16

International Conference on Machine Learning (ICML)

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang

Published: 2017.3.15

How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.

学習の改善ポイズニング説明可能性評価

Computing differential privacy guarantees for heterogeneous compositions using fft

Antti Koskela, Antti Honkela

Published: 2021

Tight differential privacy for discrete-valued mechanisms and for the subsampled gaussian mechanism using fft

Antti Koskela, Joonas Jälkö, Lukas Prediger, Antti Honkela

Published: 2020

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security

Membership privacy: A unifying framework for privacy definitions

Li, N., Qardaji, W., Su, D., Wu, Y., Yang, W.

Published: 2013

Optimal membership inference bounds for adaptive composition of sampled gaussian mechanisms

Saeed Mahloujifar, Alexandre Sablayrolles, Graham Cormode, Somesh Jha

Published: 2022

arxiv

被引用数 8

Rényi Differential Privacy of the Sampled Gaussian Mechanism

Ilya Mironov, Kunal Talwar, Li Zhang

Published: 2019.8.28

The Sampled Gaussian Mechanism (SGM)---a composition of subsampling and the additive Gaussian noise---has been successfully used in a number of machine learning applications. The mechanism's unexpected power is derived from privacy amplification by sampling where the privacy cost of a single evaluation diminishes quadratically, rather than linearly, with the sampling rate. Characterizing the precise privacy properties of SGM motivated development of several relaxations of the notion of differential privacy. This work unifies and fills in gaps in published results on SGM. We describe a numerically stable procedure for precise computation of SGM's R\'enyi Differential Privacy and prove a nearly tight (within a small constant factor) closed-form bound.

プライバシー評価情報理論的プライバシーサンプル複雑性

arxiv

被引用数 1

Enhanced Membership Inference Attacks against Machine Learning Models

Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri

Published: 2021.11.18

How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.

敵対的攻撃メンバーシップ推論ポイズニング

International Conference on Foundations of Software Science and Computational Structures

On the foundations of quantitative information flow

Geoffrey Smith

Published: 2009

Proceedings on Privacy Enhancing Technologies

Privacy loss classes: The central limit theorem in differential privacy

David Sommer, Sebastian Meiser, Esfandiar Mohammadi

Published: 2019

arxiv

被引用数 4

Subsampled Rényi Differential Privacy and Analytical Moments Accountant

Yu-Xiang Wang, Borja Balle, Shiva Kasiviswanathan

Published: 2018.8.1

We study the problem of subsampling in differential privacy (DP), a question that is the centerpiece behind many successful differentially private machine learning algorithms. Specifically, we provide a tight upper bound on the R\'enyi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms that: (1) subsample the dataset, and then (2) applies a randomized mechanism M to the subsample, in terms of the RDP parameters of M and the subsampling probability parameter. Our results generalize the moments accounting technique, developed by Abadi et al. (2016) for the Gaussian mechanism, to any subsampled RDP mechanism.

プライバシー評価差分プライバシー RDPの特性

SIAM Journal on Computing

On constructing minimum spanning trees in k-dimensional spaces and related problems

Andrew Chi-Chih Yao

Published: 1982

Journal of Computer Security

Overfitting, robustness, and malicious algorithms: A study of potential causes of privacy risk in machine learning

Samuel Yeom, Irene Giacomelli, Alan Menaged, Matt Fredrikson, Somesh Jha

Published: 2020

arxiv

被引用数 4

Computing Research Repository (CoRR)

Opacus: User-Friendly Differential Privacy Library in PyTorch

Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, Ilya Mironov

Published: 2021.9.25

We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of layers, including multi-head attention, convolution, LSTM, GRU (and generic RNN), and embedding, right out of the box and provides the means for supporting other user-defined layers. Opacus computes batched per-sample gradients, providing higher efficiency compared to the traditional "micro batch" approach. In this paper we present Opacus, detail the principles that drove its implementation and unique features, and benchmark it against other frameworks for training models with differential privacy as well as standard PyTorch.

DP-SGD 性能評価ライブラリ分類

arxiv

被引用数 1

International Conference on Machine Learning (ICML)

Bayesian Estimation of Differential Privacy

Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Rühle, Andrew Paverd, Mohammad Naseri, Boris Köpf, Daniel Jones

Published: 2022.6.11

Algorithms such as Differentially Private SGD enable training machine learning models with formal privacy guarantees. However, there is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice. An emerging strand of work empirically estimates the protection afforded by differentially private training as a confidence interval for the privacy budget $\varepsilon$ spent on training a model. Existing approaches derive confidence intervals for $\varepsilon$ from confidence intervals for the false positive and false negative rates of membership inference attacks. Unfortunately, obtaining narrow high-confidence intervals for $\epsilon$ using this method requires an impractically large sample size and training as many models as samples. We propose a novel Bayesian method that greatly reduces sample size, and adapt and validate a heuristic to draw more than one sample per trained model. Our Bayesian method exploits the hypothesis testing interpretation of differential privacy to obtain a posterior for $\varepsilon$ (not just a confidence interval) from the joint posterior of the false positive and false negative rates of membership inference attacks. For the same sample size and confidence, we derive confidence intervals for $\varepsilon$ around 40% narrower than prior work. The heuristic, which we adapt from label-only DP, can be used to further reduce the number of trained models needed to get enough samples by up to 2 orders of magnitude.

ベイズセキュリティプライバシー評価深層学習手法

arxiv

被引用数 1

Attribute Privacy: Framework and Mechanisms

Wanrong Zhang, Olga Ohrimenko, Rachel Cummings

Published: 2020.9.9

Ensuring the privacy of training data is a growing concern since many machine learning models are trained on confidential and potentially sensitive data. Much attention has been devoted to methods for protecting individual privacy during analyses of large datasets. However in many settings, global properties of the dataset may also be sensitive (e.g., mortality rate in a hospital rather than presence of a particular patient in the dataset). In this work, we depart from individual privacy to initiate the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis. We propose definitions to capture \emph{attribute privacy} in two relevant cases where global attributes may need to be protected: (1) properties of a specific dataset and (2) parameters of the underlying distribution from which dataset is sampled. We also provide two efficient mechanisms and one inefficient mechanism that satisfy attribute privacy for these settings. We base our results on a novel use of the Pufferfish framework to account for correlations across attributes in the data, thus addressing "the challenging problem of developing Pufferfish instantiations and algorithms for general aggregate secrets" that was left open by \cite{kifer2014pufferfish}.

データ生成機械学習技術暗号学