We present a method for provably defending any pretrained image classifier
against $\ell_p$ adversarial attacks. This method, for instance, allows public
vision API providers and users to seamlessly convert pretrained non-robust
classification services into provably robust ones. By prepending a
custom-trained denoiser to any off-the-shelf image classifier and using
randomized smoothing, we effectively create a new classifier that is guaranteed
to be $\ell_p$-robust to adversarial examples, without modifying the pretrained
classifier. Our approach applies to both the white-box and the black-box
settings of the pretrained classifier. We refer to this defense as denoised
smoothing, and we demonstrate its effectiveness through extensive
experimentation on ImageNet and CIFAR-10. Finally, we use our approach to
provably defend the Azure, Google, AWS, and ClarifAI image classification APIs.
Our code replicating all the experiments in the paper can be found at:
https://github.com/microsoft/denoised-smoothing.