It is commonly believed that networks cannot be both accurate and robust,
that gaining robustness means losing accuracy. It is also generally believed
that, unless making networks larger, network architectural elements would
otherwise matter little in improving adversarial robustness. Here we present
evidence to challenge these common beliefs by a careful study about adversarial
training. Our key observation is that the widely-used ReLU activation function
significantly weakens adversarial training due to its non-smooth nature. Hence
we propose smooth adversarial training (SAT), in which we replace ReLU with its
smooth approximations to strengthen adversarial training. The purpose of smooth
activation functions in SAT is to allow it to find harder adversarial examples
and compute better gradient updates during adversarial training.
Compared to standard adversarial training, SAT improves adversarial
robustness for "free", i.e., no drop in accuracy and no increase in
computational cost. For example, without introducing additional computations,
SAT significantly enhances ResNet-50's robustness from 33.0% to 42.3%, while
also improving accuracy by 0.9% on ImageNet. SAT also works well with larger
networks: it helps EfficientNet-L1 to achieve 82.2% accuracy and 58.6%
robustness on ImageNet, outperforming the previous state-of-the-art defense by
9.5% for accuracy and 11.6% for robustness. Models are available at
https://github.com/cihangxie/SmoothAdversarialTraining.