Adversarial perturbations dramatically decrease the accuracy of
state-of-the-art image classifiers. In this paper, we propose and analyze a
simple and computationally efficient defense strategy: inject random Gaussian
noise, discretize each pixel, and then feed the result into any pre-trained
classifier. Theoretically, we show that our randomized discretization strategy
reduces the KL divergence between original and adversarial inputs, leading to a
lower bound on the classification accuracy of any classifier against any
(potentially whitebox) $\ell_\infty$-bounded adversarial attack. Empirically,
we evaluate our defense on adversarial examples generated by a strong iterative
PGD attack. On ImageNet, our defense is more robust than adversarially-trained
networks and the winning defenses of the NIPS 2017 Adversarial Attacks &
Defenses competition.