Defending against Whitebox Adversarial Attacks via Randomized Discretization

TOP 文献データベース Defending against Whitebox Adversarial Attacks via Randomized Discretization

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1903.10586

PDF

https://arxiv.org/pdf/1903.10586

文献情報

作者: Yuchen Zhang,Percy Liang
公開日: 2019-3-26
所属機関: Microsoft Corporation
所属の国: United States of America
会議名

AIにより推定されたラベル

敵対的攻撃検出モデルの頑健性保証効果的な摂動手法

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $\ell_\infty$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.