Adversarial training (AT) is one of the most effective strategies for
promoting model robustness. However, recent benchmarks show that most of the
proposed improvements on AT are less effective than simply early stopping the
training procedure. This counter-intuitive fact motivates us to investigate the
implementation details of tens of AT methods. Surprisingly, we find that the
basic settings (e.g., weight decay, training schedule, etc.) used in these
methods are highly inconsistent. In this work, we provide comprehensive
evaluations on CIFAR-10, focusing on the effects of mostly overlooked training
tricks and hyperparameters for adversarially trained models. Our empirical
observations suggest that adversarial robustness is much more sensitive to some
basic training settings than we thought. For example, a slightly different
value of weight decay can reduce the model robust accuracy by more than 7%,
which is probable to override the potential promotion induced by the proposed
methods. We conclude a baseline training setting and re-implement previous
defenses to achieve new state-of-the-art results. These facts also appeal to
more concerns on the overlooked confounders when benchmarking defenses.