On Detecting Adversarial Perturbations

TOP 文献データベース On Detecting Adversarial Perturbations

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1702.04267

PDF

https://arxiv.org/pdf/1702.04267

文献情報

作者: Jan Hendrik Metzen,Tim Genewein,Volker Fischer,Bastian Bischoff
公開日: 2017-2-15
更新日: 2017-2-21
所属機関: Bosch Center for Artificial Intelligence, Robert Bosch GmbH
所属の国: Germany
会議名: International Conference on Learning Representations (ICLR)

AIにより推定されたラベル

敵対的サンプルの検知モデルの堅牢性敵対的サンプル

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Machine learning and deep learning in particular has advanced tremendously on perceptual tasks in recent years. However, it remains vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to a human. In this work, we propose to augment deep neural networks with a small "detector" subnetwork which is trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. Our method is orthogonal to prior work on addressing adversarial perturbations, which has mostly focused on making the classification network itself more robust. We show empirically that adversarial perturbations can be detected surprisingly well even though they are quasi-imperceptible to humans. Moreover, while the detectors have been trained to detect only a specific adversary, they generalize to similar and weaker adversaries. In addition, we propose an adversarial attack that fools both the classifier and the detector and a novel training procedure for the detector that counteracts this attack.

外部データセット

CIFAR10

ImageNet