Adversarial Detection and Correction by Matching Prediction Distributions

TOP 文献データベース Adversarial Detection and Correction by Matching Prediction Distributions

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2002.09364

PDF

https://arxiv.org/pdf/2002.09364

文献情報

作者: Giovanni Vacanti,Arnaud Van Looveren
公開日: 2020-2-22
所属機関: Seldon Technologies Ltd
所属の国: United Kingdom
会議名: Computing Research Repository (CoRR)

AIにより推定されたラベル

防御手法敵対的訓練アドバイス提供

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We present a novel adversarial detection and correction method for machine learning classifiers.The detector consists of an autoencoder trained with a custom loss function based on the Kullback-Leibler divergence between the classifier predictions on the original and reconstructed instances.The method is unsupervised, easy to train and does not require any knowledge about the underlying attack. The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST, and remains very effective on CIFAR-10 when the attack is granted full access to the classification model but not the defence. We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence and investigate the robustness of the attack. The method is very flexible and can also be used to detect common data corruptions and perturbations which negatively impact the model performance. We illustrate this capability on the CIFAR-10-C dataset.