Natural and Adversarial Error Detection using Invariance to Image Transformations

TOP 文献データベース Natural and Adversarial Error Detection using Invariance to Image Transformations

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1902.00236

PDF

https://arxiv.org/pdf/1902.00236

文献情報

作者: Yuval Bahat,Michal Irani,Gregory Shakhnarovich
公開日: 2019-2-1
所属機関: Department of Applied Math & Computer Science, Weizmann Institute of Science
所属の国: Israel
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

モデルの頑健性保証ロバスト性向上データキュレーション

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We propose an approach to distinguish between correct and incorrect image classifications. Our approach can detect misclassifications which either occur $\it{unintentionally}$ ("natural errors"), or due to $\it{intentional~adversarial~attacks}$ ("adversarial errors"), both in a single $\it{unified~framework}$. Our approach is based on the observation that correctly classified images tend to exhibit robust and consistent classifications under certain image transformations (e.g., horizontal flip, small image translation, etc.). In contrast, incorrectly classified images (whether due to adversarial errors or natural errors) tend to exhibit large variations in classification results under such transformations. Our approach does not require any modifications or retraining of the classifier, hence can be applied to any pre-trained classifier. We further use state of the art targeted adversarial attacks to demonstrate that even when the adversary has full knowledge of our method, the adversarial distortion needed for bypassing our detector is $\it{no~longer~imperceptible~to~the~human~eye}$. Our approach obtains state-of-the-art results compared to previous adversarial detection methods, surpassing them by a large margin.

外部データセット

ImageNet

CIFAR-10

STL-10

CIFAR-100