Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

TOP 文献データベース Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2107.11630

PDF

https://arxiv.org/pdf/2107.11630

文献情報

作者: Florian Tramèr
公開日: 2021-7-25
更新日: 2022-6-16
所属機関: Google Research
所属の国: United States of America
会議名

AIにより推定されたラベル

防御メカニズム難易度の高いサンプル機械学習の役割

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of detecting perturbed inputs. We show a barrier towards this goal. We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance {\epsilon} (in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance {\epsilon}/2. Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated. To illustrate, we revisit 13 detector defenses. For 11/13 cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.

外部データセット

MNIST

CIFAR-10

ImageNet