By injecting adversarial examples into training data, adversarial training is
promising for improving the robustness of deep learning models. However, most
existing adversarial training approaches are based on a specific type of
adversarial attack. It may not provide sufficiently representative samples from
the adversarial domain, leading to a weak generalization ability on adversarial
examples from other attacks. Moreover, during the adversarial training,
adversarial perturbations on inputs are usually crafted by fast single-step
adversaries so as to scale to large datasets. This work is mainly focused on
the adversarial training yet efficient FGSM adversary. In this scenario, it is
difficult to train a model with great generalization due to the lack of
representative adversarial samples, aka the samples are unable to accurately
reflect the adversarial domain. To alleviate this problem, we propose a novel
Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to
regard the adversarial training on FGSM adversary as a domain adaption task
with limited number of target domain samples. The main idea is to learn a
representation that is semantically meaningful and domain invariant on the
clean domain as well as the adversarial domain. Empirical evaluations on
Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly
improve the generalization of adversarial training and the smoothness of the
learned models, and outperforms state-of-the-art methods on standard benchmark
datasets. To show the transfer ability of our method, we also extend ATDA to
the adversarial training on iterative attacks such as PGD-Adversial Training
(PAT) and the defense performance is improved considerably.