Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

TOP Literature Database Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/1612.07767

PDF

https://arxiv.org/pdf/1612.07767

Paper Information

Author: Xin Li,Fuxin Li
Published: 12-23-2016
Updated: 10-27-2017
Affiliation: School of Electrical Engineering and Computer Science, Oregon State University
Country: United States of America
Conference: IEEE/CVF International Conference on Computer Vision (ICCV)

Labels Estimated by AI

Adversarial Example Detection Deep Learning Model Adversarial Example

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Deep learning has greatly improved visual recognition in recent years. However, recent research has shown that there exist many adversarial examples that can negatively impact the performance of such an architecture. This paper focuses on detecting those adversarial examples by analyzing whether they come from the same distribution as the normal examples. Instead of directly training a deep neural network to detect adversarials, a much simpler approach was proposed based on statistics on outputs from convolutional layers. A cascade classifier was designed to efficiently detect adversarials. Furthermore, trained from one particular adversarial generating mechanism, the resulting classifier can successfully detect adversarials from a completely different mechanism as well. The resulting classifier is non-subdifferentiable, hence creates a difficulty for adversaries to attack by using the gradient of the classifier. After detecting adversarial examples, we show that many of them can be recovered by simply performing a small average filter on the image. Those findings should lead to more insights about the classification mechanisms in deep convolutional neural networks.

External Datasets

ImageNet validation set

ILSVRC-2012 validation set

EA-adversarial images

DeepFool adversarial images