In the last couple of years, several adversarial attack methods based on
different threat models have been proposed for the image classification
problem. Most existing defenses consider additive threat models in which sample
perturbations have bounded L_p norms. These defenses, however, can be
vulnerable against adversarial attacks under non-additive threat models. An
example of an attack method based on a non-additive threat model is the
Wasserstein adversarial attack proposed by Wong et al. (2019), where the
distance between an image and its adversarial example is determined by the
Wasserstein metric ("earth-mover distance") between their normalized pixel
intensities. Until now, there has been no certifiable defense against this type
of attack. In this work, we propose the first defense with certified robustness
against Wasserstein Adversarial attacks using randomized smoothing. We develop
this certificate by considering the space of possible flows between images, and
representing this space such that Wasserstein distance between images is
upper-bounded by L_1 distance in this flow-space. We can then apply existing
randomized smoothing certificates for the L_1 metric. In MNIST and CIFAR-10
datasets, we find that our proposed defense is also practically effective,
demonstrating significantly improved accuracy under Wasserstein adversarial
attack compared to unprotected models.