Adversarial attacks, particularly the Fast Gradient Sign Method (FGSM) and
Projected Gradient Descent (PGD) pose significant threats to the robustness of
deep learning models in image classification. This paper explores and refines
defense mechanisms against these attacks to enhance the resilience of neural
networks. We employ a combination of adversarial training and innovative
preprocessing techniques, aiming to mitigate the impact of adversarial
perturbations. Our methodology involves modifying input data before
classification and investigating different model architectures and training
strategies. Through rigorous evaluation of benchmark datasets, we demonstrate
the effectiveness of our approach in defending against FGSM and PGD attacks.
Our results show substantial improvements in model robustness compared to
baseline methods, highlighting the potential of our defense strategies in
real-world applications. This study contributes to the ongoing efforts to
develop secure and reliable machine learning systems, offering practical
insights and paving the way for future research in adversarial defense. By
bridging theoretical advancements and practical implementation, we aim to
enhance the trustworthiness of AI applications in safety-critical domains.
外部データセット
MNIST
Fashion-MNIST
参考文献
Proceedings of the International Conference on Learning Representations
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami
Published: 2015.11.14
Deep learning algorithms have been shown to perform extremely well on many
classical machine learning problems. However, recent studies have shown that
deep learning, like other machine learning techniques, is vulnerable to
adversarial samples: inputs crafted to force a deep neural network (DNN) to
provide adversary-selected outputs. Such attacks can seriously undermine the
security of the system supported by the DNN, sometimes with devastating
consequences. For example, autonomous vehicles can be crashed, illicit or
illegal content can bypass content filters, or biometric authentication systems
can be manipulated to allow improper access. In this work, we introduce a
defensive mechanism called defensive distillation to reduce the effectiveness
of adversarial samples on DNNs. We analytically investigate the
generalizability and robustness properties granted by the use of defensive
distillation when training DNNs. We also empirically study the effectiveness of
our defense mechanisms on two DNNs placed in adversarial settings. The study
shows that defensive distillation can reduce effectiveness of sample creation
from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be
explained by the fact that distillation leads gradients used in adversarial
sample creation to be reduced by a factor of 10^30. We also find that
distillation increases the average minimum number of features that need to be
modified to create adversarial samples by about 800% on one of the DNNs we
tested.
Adversarial attack vulnerability of medical image analysis systems: Unexplored factors
Gerda Bortsova, Cristina González-Gonzalo, Suzanne C Wetstein, Florian Dubost, Ioannis Katramados, Laurens Hogeweg, Bart Liefers, Bram van Ginneken, Josien PW Pluim, Mitko Veta
Published: 2021
Proc of IEEE 24th International Conference on High Performance Computing & Communications; 8th International Conference on Data Science & Systems; 20th International Conference on Smart City; 8th International Conference on Dependability in Sensor, Cloud & Big Data Systems & Applications
A block gray adversarial attack method for image classification neural network
C. Li, C. Fan, J. Zhang, C. Li, Y. Teng
Published: 2022
On the robustness of large multimodal models against image adversarial attacks
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang, Ser-Nam Lim
Published: 2023
6th International Conference on Learning Representations (ICLR)
Ensemble adversarial training: Attacks and defenses
Tramer, F., Kurakin, A., Papernot, N., Goodfellow, I. J., Boneh, D., McDaniel, P. D.
Published: 2018
Springer
Adversarial attack versus a bio-inspired defensive method for image classification
O. Garcia-Porras, S. Salazar-Colores, E.U. Moya-Sanchez, A. Sanchez-Perez
Published: 2023
International conference on machine learning
Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples
Anish Athalye, Nicholas Carlini, David Wagner
Published: 2018
Proc. of the 7th International Conference on Image, Vision and Computing
Adversarial attacks and defenses in image classification: A practical perspective
Y. Chen, M. Zhang, J. Li, X. Kuang
Published: 2022
Proc of 7th International Conference on Big Data Analytics
Defense against adversarial attacks using image label and pixel guided sparse denoiser
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini, David Wagner
Published: 2016.8.17
Neural networks provide state-of-the-art results for most machine learning
tasks. Unfortunately, neural networks are vulnerable to adversarial examples:
given an input $x$ and any target classification $t$, it is possible to find a
new input $x'$ that is similar to $x$ but classified as $t$. This makes it
difficult to apply neural networks in security-critical areas. Defensive
distillation is a recently proposed approach that can take an arbitrary neural
network, and increase its robustness, reducing the success rate of current
attacks' ability to find adversarial examples from $95\%$ to $0.5\%$.
In this paper, we demonstrate that defensive distillation does not
significantly increase the robustness of neural networks by introducing three
new attack algorithms that are successful on both distilled and undistilled
neural networks with $100\%$ probability. Our attacks are tailored to three
distance metrics used previously in the literature, and when compared to
previous adversarial example generation algorithms, our attacks are often much
more effective (and never worse). Furthermore, we propose using high-confidence
adversarial examples in a simple transferability test we show can also be used
to break defensive distillation. We hope our attacks will be used as a
benchmark in future defense attempts to create neural networks that resist
adversarial examples.