Adversarial patch attacks are among one of the most practical threat models
against real-world computer vision systems. This paper studies certified and
empirical defenses against patch attacks. We begin with a set of experiments
showing that most existing defenses, which work by pre-processing input images
to mitigate adversarial patches, are easily broken by simple white-box
adversaries. Motivated by this finding, we propose the first certified defense
against patch attacks, and propose faster methods for its training.
Furthermore, we experiment with different patch shapes for testing, obtaining
surprisingly good robustness transfer across shapes, and present preliminary
results on certified defense against sparse attacks. Our complete
implementation can be found on:
https://github.com/Ping-C/certifiedpatchdefense.