Defending against Whitebox Adversarial Attacks via Randomized Discretization

Authors: Yuchen Zhang, Percy Liang
Published: 2019-03-25

Source: https://arxiv.org/abs/1903.10586

PDF: https://arxiv.org/pdf/1903.10586

Labels Predicted by AI

Adversarial Attack Detection Certified Robustness Effective Perturbation Methods

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) ℓ_∞-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.