Current neural-network-based classifiers are susceptible to adversarial
examples. The most empirically successful approach to defending against such
adversarial examples is adversarial training, which incorporates a strong
self-attack during training to enhance its robustness. This approach, however,
is computationally expensive and hence is hard to scale up. A recent work,
called fast adversarial training, has shown that it is possible to markedly
reduce computation time without sacrificing significant performance. This
approach incorporates simple self-attacks, yet it can only run for a limited
number of training epochs, resulting in sub-optimal performance. In this paper,
we conduct experiments to understand the behavior of fast adversarial training
and show the key to its success is the ability to recover from overfitting to
weak attacks. We then extend our findings to improve fast adversarial training,
demonstrating superior robust accuracy to strong adversarial training, with
much-reduced training time.