The vulnerability of deep networks to adversarial attacks is a central
problem for deep learning from the perspective of both cognition and security.
The current most successful defense method is to train a classifier using
adversarial images created during learning. Another defense approach involves
transformation or purification of the original input to remove adversarial
signals before the image is classified. We focus on defending naturally-trained
classifiers using Markov Chain Monte Carlo (MCMC) sampling with an Energy-Based
Model (EBM) for adversarial purification. In contrast to adversarial training,
our approach is intended to secure pre-existing and highly vulnerable
classifiers.
The memoryless behavior of long-run MCMC sampling will eventually remove
adversarial signals, while metastable behavior preserves consistent appearance
of MCMC samples after many steps to allow accurate long-run prediction.
Balancing these factors can lead to effective purification and robust
classification. We evaluate adversarial defense with an EBM using the strongest
known attacks against purification. Our contributions are 1) an improved method
for training EBM's with realistic long-run MCMC samples, 2) an
Expectation-Over-Transformation (EOT) defense that resolves theoretical
ambiguities for stochastic defenses and from which the EOT attack naturally
follows, and 3) state-of-the-art adversarial defense for naturally-trained
classifiers and competitive defense compared to adversarially-trained
classifiers on Cifar-10, SVHN, and Cifar-100. Code and pre-trained models are
available at https://github.com/point0bar1/ebm-defense.