Adversarial training, a method for learning robust deep networks, is
typically assumed to be more expensive than traditional training due to the
necessity of constructing adversarial examples via a first-order method like
projected gradient decent (PGD). In this paper, we make the surprising
discovery that it is possible to train empirically robust models using a much
weaker and cheaper adversary, an approach that was previously believed to be
ineffective, rendering the method no more costly than standard training in
practice. Specifically, we show that adversarial training with the fast
gradient sign method (FGSM), when combined with random initialization, is as
effective as PGD-based training but has significantly lower cost. Furthermore
we show that FGSM adversarial training can be further accelerated by using
standard techniques for efficient training of deep networks, allowing us to
learn a robust CIFAR10 classifier with 45% robust accuracy to PGD attacks with
$\epsilon=8/255$ in 6 minutes, and a robust ImageNet classifier with 43% robust
accuracy at $\epsilon=2/255$ in 12 hours, in comparison to past work based on
"free" adversarial training which took 10 and 50 hours to reach the same
respective thresholds. Finally, we identify a failure mode referred to as
"catastrophic overfitting" which may have caused previous attempts to use FGSM
adversarial training to fail. All code for reproducing the experiments in this
paper as well as pretrained model weights are at
https://github.com/locuslab/fast_adversarial.