These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Making classifiers robust to adversarial examples is hard. Thus, many
defenses tackle the seemingly easier task of detecting perturbed inputs. We
show a barrier towards this goal. We prove a general hardness reduction between
detection and classification of adversarial examples: given a robust detector
for attacks at distance {\epsilon} (in some metric), we can build a similarly
robust (but inefficient) classifier for attacks at distance {\epsilon}/2. Our
reduction is computationally inefficient, and thus cannot be used to build
practical classifiers. Instead, it is a useful sanity check to test whether
empirical detection results imply something much stronger than the authors
presumably anticipated. To illustrate, we revisit 13 detector defenses. For
11/13 cases, we show that the claimed detection results would imply an
inefficient classifier with robustness far beyond the state-of-the-art.