These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Deep learning has greatly improved visual recognition in recent years.
However, recent research has shown that there exist many adversarial examples
that can negatively impact the performance of such an architecture. This paper
focuses on detecting those adversarial examples by analyzing whether they come
from the same distribution as the normal examples. Instead of directly training
a deep neural network to detect adversarials, a much simpler approach was
proposed based on statistics on outputs from convolutional layers. A cascade
classifier was designed to efficiently detect adversarials. Furthermore,
trained from one particular adversarial generating mechanism, the resulting
classifier can successfully detect adversarials from a completely different
mechanism as well. The resulting classifier is non-subdifferentiable, hence
creates a difficulty for adversaries to attack by using the gradient of the
classifier. After detecting adversarial examples, we show that many of them can
be recovered by simply performing a small average filter on the image. Those
findings should lead to more insights about the classification mechanisms in
deep convolutional neural networks.