In recent years, machine learning models, especially deep neural networks,
have been widely used for classification tasks in the security domain. However,
these models have been shown to be vulnerable to adversarial manipulation:
small changes learned by an adversarial attack model, when applied to the
input, can cause significant changes in the output. Most research on
adversarial attacks and corresponding defense methods focuses only on scenarios
where adversarial samples are directly generated by the attack model. In this
study, we explore a more practical scenario in behavior-based authentication,
where adversarial samples are collected from the attacker. The generated
adversarial samples from the model are replicated by attackers with a certain
level of discrepancy. We propose an eXplainable AI (XAI) based defense strategy
against adversarial attacks in such scenarios. A feature selector, trained with
our method, can be used as a filter in front of the original authenticator. It
filters out features that are more vulnerable to adversarial attacks or
irrelevant to authentication, while retaining features that are more robust.
Through comprehensive experiments, we demonstrate that our XAI based defense
strategy is effective against adversarial attacks and outperforms other defense
strategies, such as adversarial training and defensive distillation.
Deepfool: a simple and accurate method to fool deep neural networks
S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard
Published: 2016
ICLR
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus
Published: 2014
arxiv
被引用数 1
IEEE Symposium on Security and Privacy
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, Ananthram Swami
Published: 2015.11.14
Deep learning algorithms have been shown to perform extremely well on many
classical machine learning problems. However, recent studies have shown that
deep learning, like other machine learning techniques, is vulnerable to
adversarial samples: inputs crafted to force a deep neural network (DNN) to
provide adversary-selected outputs. Such attacks can seriously undermine the
security of the system supported by the DNN, sometimes with devastating
consequences. For example, autonomous vehicles can be crashed, illicit or
illegal content can bypass content filters, or biometric authentication systems
can be manipulated to allow improper access. In this work, we introduce a
defensive mechanism called defensive distillation to reduce the effectiveness
of adversarial samples on DNNs. We analytically investigate the
generalizability and robustness properties granted by the use of defensive
distillation when training DNNs. We also empirically study the effectiveness of
our defense mechanisms on two DNNs placed in adversarial settings. The study
shows that defensive distillation can reduce effectiveness of sample creation
from 95% to less than 0.5% on a studied DNN. Such dramatic gains can be
explained by the fact that distillation leads gradients used in adversarial
sample creation to be reduced by a factor of 10^30. We also find that
distillation increases the average minimum number of features that need to be
modified to create adversarial samples by about 800% on one of the DNNs we
tested.
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini, David Wagner
Published: 2016.8.17
Neural networks provide state-of-the-art results for most machine learning
tasks. Unfortunately, neural networks are vulnerable to adversarial examples:
given an input $x$ and any target classification $t$, it is possible to find a
new input $x'$ that is similar to $x$ but classified as $t$. This makes it
difficult to apply neural networks in security-critical areas. Defensive
distillation is a recently proposed approach that can take an arbitrary neural
network, and increase its robustness, reducing the success rate of current
attacks' ability to find adversarial examples from $95\%$ to $0.5\%$.
In this paper, we demonstrate that defensive distillation does not
significantly increase the robustness of neural networks by introducing three
new attack algorithms that are successful on both distilled and undistilled
neural networks with $100\%$ probability. Our attacks are tailored to three
distance metrics used previously in the literature, and when compared to
previous adversarial example generation algorithms, our attacks are often much
more effective (and never worse). Furthermore, we propose using high-confidence
adversarial examples in a simple transferability test we show can also be used
to break defensive distillation. We hope our attacks will be used as a
benchmark in future defense attempts to create neural networks that resist
adversarial examples.