Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

TOP Literature Database Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2107.11630

PDF

https://arxiv.org/pdf/2107.11630

Paper Information

Author: Florian Tramèr
Published: 7-25-2021
Updated: 6-16-2022
Affiliation: Google Research
Country: United States of America
Conference

Labels Estimated by AI

Defense Mechanism High Difficulty Sample Role of Machine Learning

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance {\epsilon} (in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance {\epsilon}/2. Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated. To illustrate, we revisit 13 detector defenses. For 11/13 cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.

External Datasets

MNIST

CIFAR-10

ImageNet