Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks

Proceedings of the International Conference on Learning Representations

Explaining and harnessing adversarial examples

I. J. Goodfellow, J. Shlens, C. Szegedy

Published: 2015

arxiv

被引用数 45

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu

Published: 2017.6.20

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.

モデルの頑健性保証敵対的サンプルロバスト性に関する評価

arxiv

被引用数 1

IEEE Symposium on Security and Privacy

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami

Published: 2015.11.14

Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive distillation when training DNNs. We also empirically study the effectiveness of our defense mechanisms on two DNNs placed in adversarial settings. The study shows that defensive distillation can reduce effectiveness of sample creation from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be explained by the fact that distillation leads gradients used in adversarial sample creation to be reduced by a factor of 10^30. We also find that distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one of the DNNs we tested.

モデルの頑健性保証深層学習敵対的サンプル

IntechOpen, London, UK

Adversarial attacks on Image classification models: FGSM and patch attacks and their impact

Sen, J., Dasgupta, S.

Proc. of the 10th Int. Conf, on Business Analysis and Intelligence

Adversarial attacks on image classification models: Analysis and defense

J. Sen, A. Sen, A. Chatterjee

Published: 2023

Association for Computing Machinery

Adversarial examples are not easily detected: Bypassing ten detection methods

N. Carlini, D. Wagner

Published: 2017

AICAttack: Adversarial image captioning attack with attention-based optimization

J. Li, M. Ni, Y. Dong, T. Zhu, W. Liu

Published: 2024

Medical Image Analysis

Adversarial attack vulnerability of medical image analysis systems: Unexplored factors

Gerda Bortsova, Cristina González-Gonzalo, Suzanne C Wetstein, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bart Liefers, Bram van Ginneken, Josien PW Pluim, Mitko Veta

Published: 2021

Proc of IEEE 24th International Conference on High Performance Computing & Communications; 8th International Conference on Data Science & Systems; 20th International Conference on Smart City; 8th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Applications

A block gray adversarial attack method for image classification neural network

C. Li, C. Fan, J. Zhang, C. Li, Y. Teng

Published: 2022

On the robustness of large multimodal models against image adversarial attacks

Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang, Ser-Nam Lim

Published: 2023

6th International Conference on Learning Representations (ICLR)

Ensemble adversarial training: Attacks and defenses

Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D., McDaniel, P. D.

Published: 2018

Springer

Adversarial attack versus a bio-inspired defensive method for image classification

O. Garcia-Porras, S. Salazar-Colores, E.U. Moya-Sanchez, A. Sanchez-Perez

Published: 2023

International conference on machine learning

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, David Wagner

Published: 2018

Proc. of the 7th International Conference on Image, Vision and Computing

Adversarial attacks and defenses in image classification: A practical perspective

Y. Chen, M. Zhang, J. Li, X. Kuang

Published: 2022

Proc of 7th International Conference on Big Data Analytics

Defense against adversarial attacks using image label and pixel guided sparse denoiser

M. Li, C. Cao

Published: 2022

arxiv

被引用数 1

IEEE Military Communications Conference (MILCOM)

TensorShield: Tensor-based Defense Against Adversarial Attacks on Images

Negin Entezari, Evangelos E. Papalexakis

Published: 2020.2.18

Recent studies have demonstrated that machine learning approaches like deep neural networks (DNNs) are easily fooled by adversarial attacks. Subtle and imperceptible perturbations of the data are able to change the result of deep neural networks. Leveraging vulnerable machine learning methods raises many concerns especially in domains where security is an important factor. Therefore, it is crucial to design defense mechanisms against adversarial attacks. For the task of image classification, unnoticeable perturbations mostly occur in the high-frequency spectrum of the image. In this paper, we utilize tensor decomposition techniques as a preprocessing step to find a low-rank approximation of images which can significantly discard high-frequency perturbations. Recently a defense framework called Shield could "vaccinate" Convolutional Neural Networks (CNN) against adversarial examples by performing random-quality JPEG compressions on local patches of images on the ImageNet dataset. Our tensor-based defense mechanism outperforms the SLQ method from Shield by 14% against FastGradient Descent (FGSM) adversarial attacks, while maintaining comparable speed.

防御手法敵対的サンプル性能評価

25th Annual Network and Distributed System Security Symposium, NDSS

Feature squeezing: Detecting adversarial examples in deep neural networks

Xu, W., Evans, D., Qi, Y.

Published: 2018

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

GAN-based classifier protection against adversarial attacks

S. Liu, M. Shao, X. Liu

Published: 2020

Detecting adversarial examples via neural fingerprinting

S. Dathathri, S. Zheng, T. Yin, R. M. Murray, Y. Yue

Published: 2018

Proc. of the 36th International Conf on Machine Learning

J. Cohen, E. Rosenfeld, Z. Kolter

Published: 2019

Proc. of the 25th Int Conf on Pattern Recognition

Defense mechanism against adversarial attacks using density-based representation of images

Y.-T. Huang, W.-H. Liao, C.-W. Huang

Published: 2021

Engineering

Adversarial attacks and defenses in deep learning

K. Ren, T. Zheng, Z. Qin, X. Liu

Published: 2020

Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition

Towards robust image classification using sequential attention models

D. Zoran, M. Chrzanowski, P.-S. Huang, S. Gowal, A. Mott, P. Kohli

Published: 2020

Proceedings of the IEEE

Gradient-based learning applied to document recognition

Y. Lecun, L. Bottou, Y. Bengio, P. Haffner

Published: 1998

Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms

H. Xiao, K. Rasul, R. Vollgraf

Published: 2017

3rd International Conference on Learning Representations

Very deep convolutional networks for large-scale image recognition

K. Simonyan, A. Zisserman

Published: 2015

arXiv

Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel

Published: 2016

arxiv

被引用数 1

IEEE Symposium on Security and Privacy

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

Published: 2016.8.17

Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

モデルの堅牢性敵対的サンプルモデルの頑健性保証