Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism

European Symposium on Security and Privacy (EuroS&P)

The Limitations of Deep Learning in Adversarial Settings

Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, Ananthram Swami

Published: 2015.11.24

Deep learning takes advantage of large datasets and computationally efficient training algorithms to outperform other approaches at various machine learning tasks. However, imperfections in the training phase of deep neural networks make them vulnerable to adversarial samples: inputs crafted by adversaries with the intent of causing deep neural networks to misclassify. In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In an application to computer vision, we show that our algorithms can reliably produce samples correctly classified by human subjects but misclassified in specific targets by a DNN with a 97% adversarial success rate while only modifying on average 4.02% of the input features per sample. We then evaluate the vulnerability of different sample classes to adversarial perturbations by defining a hardness measure. Finally, we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

敵対的サンプル深層学習モデル敵対的サンプルの検知

arXiv (Cornell University)

Adversarial patch

Brown, T. B., Mané, D., Roy, A., Abadi, M., Gilmer, J.

Universal Adversarial Perturbations

IEEE

Moosavi-Dezfooli, S., Fawzi, A., Fawzi, O., Frossard, P.

IEEE Symposium on Security and Privacy

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

Published: 2016.8.17

Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

モデルの堅牢性敵対的サンプルモデルの頑健性保証

arXiv (Cornell University)

Synthesizing robust adversarial examples

Athalye, A., Engstrom, L., Ilyas, A., Kwok, K. S.

arXiv (Cornell University)

Towards deep learning models resistant to adversarial attacks

Mądry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.

Published: 2018

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Boosting Adversarial Attacks with Momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, Jianguo Li

Published: 2017.10.17

Deep neural networks are vulnerable to adversarial examples, which poses security concerns on these algorithms due to the potentially severe consequences. Adversarial attacks serve as an important surrogate to evaluate the robustness of deep learning models before they are deployed. However, most of existing adversarial attacks can only fool a black-box model with a low success rate. To address this issue, we propose a broad class of momentum-based iterative algorithms to boost adversarial attacks. By integrating the momentum term into the iterative process for attacks, our methods can stabilize update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples. To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks. We hope that the proposed methods will serve as a benchmark for evaluating the robustness of various deep models and defense methods. With this method, we won the first places in NIPS 2017 Non-targeted Adversarial Attack and Targeted Adversarial Attack competitions.

モデルの頑健性保証ロバスト性向上手法敵対的サンプルの検知

International Conference on Machine Learning (ICML)

被引用数 19

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce, Matthias Hein

Published: 2020.3.4

The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 50 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than $10\%$, identifying several broken defenses.

敵対的摂動手法ロバスト性評価防御手法

IEEE Symposium on Security and Privacy

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami

Published: 2015.11.14

Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10^30. We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.

モデルの頑健性保証深層学習敵対的サンプル

ACM

Practical Black-Box Attacks against Machine Learning

Papernot, N., McDaniel, P., Goodfellow, I. J., Jha, S., Celik, Z. B., Swami, A.

MNIST handwritten digit database

AT&T Labs

LeCun, Yann, Cortes, Corinna, Burges, Christopher J.C.

Published: 2010

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton

Published: 2009

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR

Imagenet: A large-scale hierarchical image database

J. Deng, W. Dong, R. Socher, L. Li, K. Li, L. Fei-Fei

Published: 2009

IJCV

Imagenet large scale visual recognition challenge

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.

Published: 2015

CVPR

Deepfool: a simple and accurate method to fool deep neural networks

S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard

Published: 2016

Attacking machine learning with adversarial examples

PyImageSearch

Adversarial attacks with FGSM (Fast Gradient Sign Method) - PyImageSearch

Rosebrock, A.

Published: 2023

Medium

Adversarial attacks explained (And how to defend ML models against them)

Sciforce

Published: 2022

Adversarial Example Generation — PyTorch Tutorials 2.2.1+cu121 documentation

DeepAI

Defensive distillation

DeepAI

Published: 2020

3rd International Conference on Learning Representations

Very deep convolutional networks for large-scale image recognition

K. Simonyan, A. Zisserman

Published: 2015

CVPR

Densely connected convolutional networks

G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger

Published: 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Aggregated residual transformations for deep neural networks

Saining Xie, Ross B. Girshick, Piotr Dollar, Zhuowen Tu, Kaiming He