Learn What You Want to Unlearn: Unlearning Inversion Attacks against Machine Unlearning

Stanford Law Review

The right to be forgotten

J. Rosen

Published: 2011

J. Tech. L. & Pol’y

The california consumer privacy act: Towards a european-style privacy regime in the united states

S. L. Pardau

Published: 2018

2015 IEEE Symposium on Security and Privacy (SP)

Towards making systems forget with machine unlearning

Y. Cao, J. Yang

Published: 2015

arxiv

被引用数 1

European Symposium on Security and Privacy (EuroS&P)

Unrolling SGD: Understanding Factors Influencing Machine Unlearning

Anvith Thudi, Gabriel Deza, Varun Chandrasekaran, Nicolas Papernot

Published: 2021.9.28

Machine unlearning is the process through which a deployed machine learning model is made to forget about some of its training data points. While naively retraining the model from scratch is an option, it is almost always associated with large computational overheads for deep learning models. Thus, several approaches to approximately unlearn have been proposed along with corresponding metrics that formalize what it means for a model to forget about a data point. In this work, we first taxonomize approaches and metrics of approximate unlearning. As a result, we identify verification error, i.e., the L2 difference between the weights of an approximately unlearned and a naively retrained model, as an approximate unlearning metric that should be optimized for as it subsumes a large class of other metrics. We theoretically analyze the canonical training algorithm, stochastic gradient descent (SGD), to surface the variables which are relevant to reducing the verification error of approximate unlearning for SGD. From this analysis, we first derive an easy-to-compute proxy for verification error (termed unlearning error). The analysis also informs the design of a new training objective penalty that limits the overall change in weights during SGD and as a result facilitates approximate unlearning with lower verification error. We validate our theoretical work through an empirical evaluation on learning with CIFAR-10, CIFAR-100, and IMDB sentiment analysis.

正則化性能評価アルゴリズム

2021 IEEE Symposium on Security and Privacy (S&P)

Machine unlearning

L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, N. Papernot

Published: 2021

International Conference on Machine Learning

Deltagrad: Rapid retraining of machine learning models

Y. Wu, E. Dobriban, S. Davidson

Published: 2020

arxiv

被引用数 1

Certified Data Removal from Machine Learning Models

Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten

Published: 2019.11.8

Good data stewardship requires removal of data at the request of the data's owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to "remove" data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

マシン・アンラーニングデータ削除アルゴリズムプライバシー評価

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security

Graph unlearning

M. Chen, Z. Zhang, T. Wang, M. Backes, M. Humbert, Y. Zhang

Published: 2022

Advances in Neural Information Processing Systems

Adaptive machine unlearning

V. Gupta, C. Jung, S. Neel, A. Roth, S. Sharifi-Malvajerdi, C. Waites

Published: 2021

arxiv

被引用数 1

International Conference on Machine Learning (ICML)

Machine Unlearning for Random Forests

Jonathan Brophy, Daniel Lowd

Published: 2020.9.12

Responding to user data deletion requests, removing noisy examples, or deleting corrupted training data are just a few reasons for wanting to delete instances from a machine learning (ML) model. However, efficiently removing this data from an ML model is generally difficult. In this paper, we introduce data removal-enabled (DaRE) forests, a variant of random forests that enables the removal of training data with minimal retraining. Model updates for each DaRE tree in the forest are exact, meaning that removing instances from a DaRE model yields exactly the same model as retraining from scratch on updated data. DaRE trees use randomness and caching to make data deletion efficient. The upper levels of DaRE trees use random nodes, which choose split attributes and thresholds uniformly at random. These nodes rarely require updates because they only minimally depend on the data. At the lower levels, splits are chosen to greedily optimize a split criterion such as Gini index or mutual information. DaRE trees cache statistics at each node and training data at each leaf, so that only the necessary subtrees are updated as data is removed. For numerical attributes, greedy nodes optimize over a random subset of thresholds, so that they can maintain statistics while approximating the optimal threshold. By adjusting the number of thresholds considered for greedy nodes, and the number of random nodes, DaRE trees can trade off between more accurate predictions and more efficient updates. In experiments on 13 real-world datasets and one synthetic dataset, we find DaRE forests delete data orders of magnitude faster than retraining from scratch while sacrificing little to no predictive power.

マシン・アンラーニングデータ削除アルゴリズム性能評価指標

31st USENIX Security Symposium (USENIX Security 22)

On the necessity of auditable algorithmic definitions for machine unlearning

A. Thudi, H. Jia, I. Shumailov, N. Papernot

Published: 2022

Proceedings of the AAAI Conference on Artificial Intelligence

Hard to forget: Poisoning attacks on certified machine unlearning

N. G. Marchant, B. I. Rubinstein, S. Alfeld

Published: 2022

arxiv

被引用数 2

Network and Distributed System Security Symposium (NDSS)

A Duty to Forget, a Right to be Assured? Exposing Vulnerabilities in Machine Unlearning Services

Hongsheng Hu, Shuo Wang, Jiamin Chang, Haonan Zhong, Ruoxi Sun, Shuang Hao, Haojin Zhu, Minhui Xue

Published: 2023.9.15

The right to be forgotten requires the removal or "unlearning" of a user's data from machine learning models. However, in the context of Machine Learning as a Service (MLaaS), retraining a model from scratch to fulfill the unlearning request is impractical due to the lack of training data on the service provider's side (the server). Furthermore, approximate unlearning further embraces a complex trade-off between utility (model performance) and privacy (unlearning performance). In this paper, we try to explore the potential threats posed by unlearning services in MLaaS, specifically over-unlearning, where more information is unlearned than expected. We propose two strategies that leverage over-unlearning to measure the impact on the trade-off balancing, under black-box access settings, in which the existing machine unlearning attacks are not applicable. The effectiveness of these strategies is evaluated through extensive experiments on benchmark datasets, across various model architectures and representative unlearning approaches. Results indicate significant potential for both strategies to undermine model efficacy in unlearning scenarios. This study uncovers an underexplored gap between unlearning and contemporary MLaaS, highlighting the need for careful considerations in balancing data unlearning, model utility, and security.

過剰適合と記憶化プライバシー手法データ保護手法

arxiv

被引用数 1

When Machine Unlearning Jeopardizes Privacy

Min Chen, Zhikun Zhang, Tianhao Wang, Michael Backes, Mathias Humbert, Yang Zhang

Published: 2020.5.5

The right to be forgotten states that a data owner has the right to erase their data from an entity storing it. In the context of machine learning (ML), the right to be forgotten requires an ML model owner to remove the data owner's data from the training set used to build the ML model, a process known as machine unlearning. While originally designed to protect the privacy of the data owner, we argue that machine unlearning may leave some imprint of the data in the ML model and thus create unintended privacy risks. In this paper, we perform the first study on investigating the unintended information leakage caused by machine unlearning. We propose a novel membership inference attack that leverages the different outputs of an ML model's two versions to infer whether a target sample is part of the training set of the original model but out of the training set of the corresponding unlearned model. Our experiments demonstrate that the proposed membership inference attack achieves strong performance. More importantly, we show that our attack in multiple cases outperforms the classical membership inference attack on the original ML model, which indicates that machine unlearning can have counterproductive effects on privacy. We notice that the privacy degradation is especially significant for well-generalized ML models where classical membership inference does not perform well. We further investigate four mechanisms to mitigate the newly discovered privacy risks and show that releasing the predicted label only, temperature scaling, and differential privacy are effective. We believe that our results can help improve privacy protection in practical implementations of machine unlearning. Our code is available at https://github.com/MinChen00/UnlearningLeaks.

メンバーシップ推論機械学習のプライバシー保護ポイズニング

arxiv

被引用数 1

International Conference on Algorithmic Learning Theory (ALT)

Descent-to-Delete: Gradient-Based Methods for Machine Unlearning

Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi

Published: 2020.7.7

We study the data deletion problem for convex models. By leveraging techniques from convex optimization and reservoir sampling, we give the first data deletion algorithms that are able to handle an arbitrarily long sequence of adversarial updates while promising both per-deletion run-time and steady-state error that do not grow with the length of the update sequence. We also introduce several new conceptual distinctions: for example, we can ask that after a deletion, the entire state maintained by the optimization algorithm is statistically indistinguishable from the state that would have resulted had we retrained, or we can ask for the weaker condition that only the observable output is statistically indistinguishable from the observable output that would have resulted from retraining. We are able to give more efficient deletion algorithms under this weaker deletion criterion.

マシン・アンラーニング学習の改善データ削除アルゴリズム

arxiv

被引用数 1

Conference on Neural Information Processing Systems (NeurIPS)

The Privacy Onion Effect: Memorization is Relative

Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Published: 2022.6.22

Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the "layer" of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments to study this effect, and understand why it occurs. The existence of this effect has various consequences. For example, it suggests that proposals to defend against memorization without training with rigorous privacy guarantees are unlikely to be effective. Further, it suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.

敵対的学習メンバーシップ推論ラベル推論攻撃

arxiv

被引用数 1

Enhanced Membership Inference Attacks against Machine Learning Models

Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Vincent Bindschaedler, Reza Shokri

Published: 2021.11.18

How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.

敵対的攻撃メンバーシップ推論ポイズニング

MIT Press

Deep Learning

I. Goodfellow, Y. Bengio, A. Courville

Published: 2016

Communications of the ACM

Understanding deep learning (still) requires rethinking generalization

C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals

Published: 2021

Foundations and Trends in Theoretical Computer Science

The Algorithmic Foundations of Differential Privacy

Cynthia Dwork, Aaron Roth

Published: 2014

Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Model inversion attacks that exploit confidence information and basic countermeasures

Matt Fredrikson, Somesh Jha, Thomas Ristenpart

Published: 2015

arxiv

被引用数 1

Federated Learning

Deep Leakage from Gradients

Ligeng Zhu, Zhijian Liu, Song Han

Published: 2019.6.21

Exchanging gradients is a widely used method in modern multi-node machine learning system (e.g., distributed training, collaborative learning). For a long time, people believed that gradients are safe to share: i.e., the training data will not be leaked by gradient exchange. However, we show that it is possible to obtain the private training data from the publicly shared gradients. We name this leakage as Deep Leakage from Gradient and empirically validate the effectiveness on both computer vision and natural language processing tasks. Experimental results show that our attack is much stronger than previous approaches: the recovery is pixel-wise accurate for images and token-wise matching for texts. We want to raise people's awareness to rethink the gradient's safety. Finally, we discuss several possible strategies to prevent such deep leakage. The most effective defense method is gradient pruning.

プライバシー保護防御的欺瞞敵対的攻撃

idlg: Improved deep leakage from gradients

Bo Zhao, Konda Reddy Mopuri, Hakan Bilen

Published: 2020

Advances in Neural Information Processing Systems (NeurIPS)

Inverting gradients — how easy is it to break privacy in federated learning?

Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, Michael Moeller

Published: 2020

NeurIPS ML Safety Workshop

Hidden poison: Machine unlearning enables camouflaged poisoning attacks

J. Z. Di, J. Douglas, J. Acharya, G. Kamath, A. Sekhari

Published: 2022

USENIX Conference on Security Symposium

Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing

Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, Thomas Ristenpart

Published: 2014

arxiv

被引用数 1

USENIX Security Symposium

The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks

Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, Dawn Song

Published: 2018.2.23

This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users' private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization. In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its application to quantitatively limit data exposure in Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.

差分プライバシープライバシー保護メカニズム情報理論的評価

arXiv preprint

Deletion inference, reconstruction, and compliance in machine (un) learning

J. Gao, S. Garg, M. Mahmoody, P. N. Vasudevan

Published: 2022

2023 IEEE 36th Computer Security Foundations Symposium (CSF)

Sok: Model inversion attack landscape: Taxonomy, challenges, and future roadmap

S. V. Dibbo

Published: 2023

arxiv

被引用数 1

Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning

Ahmed Salem, Apratim Bhattacharya, Michael Backes, Mario Fritz, Yang Zhang

Published: 2019.4.2

Machine learning (ML) has progressed rapidly during the past decade and the major factor that drives such development is the unprecedented large-scale data. As data generation is a continuous process, this leads to ML model owners updating their models frequently with newly-collected data in an online learning scenario. In consequence, if an ML model is queried with the same set of data samples at two different points in time, it will provide different results. In this paper, we investigate whether the change in the output of a black-box ML model before and after being updated can leak information of the dataset used to perform the update, namely the updating set. This constitutes a new attack surface against black-box ML models and such information leakage may compromise the intellectual property and data privacy of the ML model owner. We propose four attacks following an encoder-decoder formulation, which allows inferring diverse information of the updating set. Our new attacks are facilitated by state-of-the-art deep learning techniques. In particular, we propose a hybrid generative model (CBM-GAN) that is based on generative adversarial networks (GANs) but includes a reconstructive loss that allows reconstructing accurate samples. Our experiments show that the proposed attacks achieve strong performance.

モデル抽出攻撃再構成攻撃敵対的攻撃検出

MIT press

Deep learning

I. Goodfellow, Y. Bengio, A. Courville

Published: 2016

arxiv

被引用数 1

Communication-Efficient Learning of Deep Networks from Decentralized Data

H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas

Published: 2016.2.18

Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent.

深層学習手法連合学習通信コスト削減

International Conference on Learning Representations

R-gap: Recursive gradient attack on privacy

J. Zhu, M. B. Blaschko

Published: 2020

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

See through gradients: Image batch recovery via gradinversion

Hongxu Yin, Arun Mallya, Arash Vahdat, Jose M Alvarez, Jan Kautz, Pavlo Molchanov

Published: 2021

arxiv

被引用数 1

USENIX Security Symposium

Gradient Obfuscation Gives a False Sense of Security in Federated Learning

Kai Yue, Richeng Jin, Chau-Wai Wong, Dror Baron, Huaiyu Dai

Published: 2022.6.8

Federated learning has been proposed as a privacy-preserving machine learning framework that enables multiple clients to collaborate without sharing raw data. However, client privacy protection is not guaranteed by design in this framework. Prior work has shown that the gradient sharing strategies in federated learning can be vulnerable to data reconstruction attacks. In practice, though, clients may not transmit raw gradients considering the high communication cost or due to privacy enhancement requirements. Empirical studies have demonstrated that gradient obfuscation, including intentional obfuscation via gradient noise injection and unintentional obfuscation via gradient compression, can provide more privacy protection against reconstruction attacks. In this work, we present a new data reconstruction attack framework targeting the image classification task in federated learning. We show that commonly adopted gradient postprocessing procedures, such as gradient quantization, gradient sparsification, and gradient perturbation, may give a false sense of security in federated learning. Contrary to prior studies, we argue that privacy enhancement should not be treated as a byproduct of gradient compression. Additionally, we design a new method under the proposed framework to reconstruct the image at the semantic level. We quantify the semantic privacy leakage and compare with conventional based on image similarity scores. Our comparisons challenge the image data leakage evaluation schemes in the literature. The results emphasize the importance of revisiting and redesigning the privacy protection mechanisms for client data in existing federated learning algorithms.

再構築耐久性ポイズニング DFLに対する攻撃手法

arxiv

被引用数 8

Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning

Zhibo Wang, Mengkai Song, Zhifei Zhang, Yang Song, Qian Wang, Hairong Qi

Published: 2018.12.3

Federated learning, i.e., a mobile edge computing framework for deep learning, is a recent advance in privacy-preserving machine learning, where the model is trained in a decentralized manner by the clients, i.e., data curators, preventing the server from directly accessing those private data from the clients. This learning mechanism significantly challenges the attack from the server side. Although the state-of-the-art attacking techniques that incorporated the advance of Generative adversarial networks (GANs) could construct class representatives of the global data distribution among all clients, it is still challenging to distinguishably attack a specific client (i.e., user-level privacy leakage), which is a stronger privacy threat to precisely recover the private data from a specific client. This paper gives the first attempt to explore user-level privacy leakage against the federated learning by the attack from a malicious server. We propose a framework incorporating GAN with a multi-task discriminator, which simultaneously discriminates category, reality, and client identity of input samples. The novel discrimination on client identity enables the generator to recover user specified private data. Unlike existing works that tend to interfere the training process of the federated learning, the proposed method works "invisibly" on the server side. The experimental results demonstrate the effectiveness of the proposed attacking approach and the superior to the state-of-the-art.

連合学習差分プライバシー

2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC)

Label leakage from gradients in distributed machine learning

A. Wainakh, T. Mußig, T. Grube, M. M ¨ uhlh ¨ auser

Published: 2021

arxiv

被引用数 1

IEEE Symposium on Security and Privacy

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

Published: 2016.8.17

Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

モデルの堅牢性敵対的サンプルモデルの頑健性保証

arxiv

被引用数 1

European Symposium on Security and Privacy (EuroS&P)

The Limitations of Deep Learning in Adversarial Settings

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, Ananthram Swami

Published: 2015.11.24

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

敵対的サンプル深層学習モデル敵対的サンプルの検知

Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security

Practical black-box attacks against machine learning

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami

Published: 2017

arxiv

被引用数 1

AISec@CCS

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh

Published: 2017.8.14

Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack and significantly outperforms existing black-box attacks via substitute models.

ポイズニング攻撃手法モデルの頑健性保証

arxiv

被引用数 1

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

Jianbo Chen, Michael I. Jordan, Martin J. Wainwright

Published: 2019.4.4

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for $\ell_2$ and $\ell_\infty$ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than Boundary Attack. It also achieves competitive performance in attacking several widely-used defense mechanisms. (HopSkipJumpAttack was named Boundary Attack++ in a previous version of the preprint.)

敵対的攻撃敵対的サンプル距離評価手法

arxiv

被引用数 1

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

QEBA: Query-Efficient Boundary-Based Blackbox Attack

Huichen Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li

Published: 2020.5.29

Machine learning (ML), especially deep neural networks (DNNs) have been widely used in various applications, including several safety-critical ones (e.g. autonomous driving). As a result, recent research about adversarial examples has raised great concerns. Such adversarial attacks can be achieved by adding a small magnitude of perturbation to the input to mislead model prediction. While several whitebox attacks have demonstrated their effectiveness, which assume that the attackers have full access to the machine learning models; blackbox attacks are more realistic in practice. In this paper, we propose a Query-Efficient Boundary-based blackbox Attack (QEBA) based only on model's final prediction labels. We theoretically show why previous boundary-based attack with gradient estimation on the whole gradient space is not efficient in terms of query numbers, and provide optimality analysis for our dimension reduction-based gradient estimation. On the other hand, we conducted extensive experiments on ImageNet and CelebA datasets to evaluate QEBA. We show that compared with the state-of-the-art blackbox attacks, QEBA is able to use a smaller number of queries to achieve a lower magnitude of perturbation with 100% attack success rate. We also show case studies of attacks on real-world APIs including MEGVII Face++ and Microsoft Azure.

敵対的攻撃手法次元削減手法機械学習のプライバシー保護

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton

Published: 2009

Proceedings of the 14th International Conference on Artificial Intelligence and Statistics

An analysis of single-layer networks in unsupervised feature learning

A. Coates, A. Ng, H. Lee

Published: 2011

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Quo vadis, action recognition? a new model and the kinetics dataset

J. Carreira, A. Zisserman

Published: 2017

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Pyramid scene parsing network

H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia

Published: 2017

Proceedings of the IEEE/CVF International Conference on Computer Vision

Rethinking imagenet pre-training

K. He, R. Girshick, P. Dollar

Published: 2019

Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Deep learning with differential privacy

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, L. Zhang

Published: 2016

arXiv preprint

To prune, or not to prune: exploring the efficacy of pruning for model compression

M. Zhu, S. Gupta

Published: 2017

25th USENIX Security Symposium (USENIX Security 16)

Stealing machine learning models via prediction apis

F. Tramer, F. Zhang, A. Juels, M. K. Reiter, T. Ristenpart

Published: 2016

arxiv

被引用数 1

Membership Leakage in Label-Only Exposures

Zheng Li, Yang Zhang

Published: 2020.7.31

Machine learning (ML) has been widely adopted in various privacy-critical applications, e.g., face recognition and medical image analysis. However, recent research has shown that ML models are vulnerable to attacks against their training data. Membership inference is one major attack in this domain: Given a data sample and model, an adversary aims to determine whether the sample is part of the model's training set. Existing membership inference attacks leverage the confidence scores returned by the model as their inputs (score-based attacks). However, these attacks can be easily mitigated if the model only exposes the predicted label, i.e., the final model decision. In this paper, we propose decision-based membership inference attacks and demonstrate that label-only exposures are also vulnerable to membership leakage. In particular, we develop two types of decision-based attacks, namely transfer attack, and boundary attack. Empirical evaluation shows that our decision-based attacks can achieve remarkable performance, and even outperform the previous score-based attacks in some cases. We further present new insights on the success of membership inference based on quantitative and qualitative analysis, i.e., member samples of a model are more distant to the model's decision boundary than non-member samples. Finally, we evaluate multiple defense mechanisms against our decision-based attacks and show that our two types of attacks can bypass most of these defenses.

メンバーシップ推論攻撃手法性能評価

arxiv

被引用数 1

International Conference on Machine Learning (ICML)

Label-Only Membership Inference Attacks

Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot

Published: 2020.7.29

Membership inference attacks are one of the simplest forms of privacy leakage for machine learning models: given a data point and model, determine whether the point was used to train the model. Existing membership inference attacks exploit models' abnormal confidence when queried on their training data. These attacks do not apply if the adversary only gets access to models' predicted labels, without a confidence measure. In this paper, we introduce label-only membership inference attacks. Instead of relying on confidence scores, our attacks evaluate the robustness of a model's predicted labels under perturbations to obtain a fine-grained membership signal. These perturbations include common data augmentations or adversarial examples. We empirically show that our label-only membership inference attacks perform on par with prior attacks that required access to model confidences. We further demonstrate that label-only attacks break multiple defenses against membership inference attacks that (implicitly or explicitly) rely on a phenomenon we call confidence masking. These defenses modify a model's confidence scores in order to thwart attacks, but leave the model's predicted labels unchanged. Our label-only attacks demonstrate that confidence-masking is not a viable defense strategy against membership inference. Finally, we investigate worst-case label-only attacks, that infer membership for a small number of outlier data points. We show that label-only attacks also match confidence-based attacks in this setting. We find that training models with differential privacy and (strong) L2 regularization are the only known defense strategies that successfully prevents all attacks. This remains true even when the differential privacy budget is too high to offer meaningful provable guarantees.

攻撃手法バックドア攻撃メンバーシップ推論

arxiv

被引用数 1

USENIX Security Symposium

Systematic Evaluation of Privacy Risks of Machine Learning Models

Liwei Song, Prateek Mittal

Published: 2020.3.24

Machine learning models are prone to memorizing sensitive data, making them vulnerable to membership inference attacks in which an adversary aims to guess if an input sample was used to train the model. In this paper, we show that prior work on membership inference attacks may severely underestimate the privacy risks by relying solely on training custom neural network classifiers to perform attacks and focusing only on the aggregate results over data samples, such as the attack accuracy. To overcome these limitations, we first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks and proposing a new inference attack method based on a modification of prediction entropy. We also propose benchmarks for defense mechanisms by accounting for adaptive adversaries with knowledge of the defense and also accounting for the trade-off between model accuracy and privacy risks. Using our benchmark attacks, we demonstrate that existing defense approaches are not as effective as previously reported. Next, we introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score. Our privacy risk score metric measures an individual sample's likelihood of being a training member, which allows an adversary to identify samples with high privacy risks and perform attacks with high confidence. We experimentally validate the effectiveness of the privacy risk score and demonstrate that the distribution of privacy risk score across individual samples is heterogeneous. Finally, we perform an in-depth investigation for understanding why certain samples have high privacy risks, including correlations with model sensitivity, generalization error, and feature embeddings. Our work emphasizes the importance of a systematic and rigorous evaluation of privacy risks of machine learning models.

メンバーシップ推論防御手法プライバシー保護手法

2020 IEEE European Symposium on Security and Privacy (EuroS&P)

A pragmatic approach to membership inferences on machine learning models

Y. Long, L. Wang, D. Bu, V. Bindschaedler, X. Wang, H. Tang, C. A. Gunter, K. Chen

Published: 2020

Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security

Understanding disparate effects of membership inference attacks and their countermeasures

D. Zhong, H. Sun, J. Xu, N. Gong, W. H. Wang

Published: 2022

2022 IEEE Symposium on Security and Privacy (S&P)

Membership inference attacks from first principles

N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, F. Tramer

Published: 2022

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, Oliver Wang

Published: 2018

3rd International Conference on Learning Representations

Very deep convolutional networks for large-scale image recognition

K. Simonyan, A. Zisserman

Published: 2015

Mathematical Programming

Representations of quasi-newton matrices and their use in limited memory methods

R. H. Byrd, J. Nocedal, R. B. Schnabel

Published: 1994