These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
By injecting adversarial examples into training data, adversarial training is
promising for improving the robustness of deep learning models. However, most
existing adversarial training approaches are based on a specific type of
adversarial attack. It may not provide sufficiently representative samples from
the adversarial domain, leading to a weak generalization ability on adversarial
examples from other attacks. Moreover, during the adversarial training,
adversarial perturbations on inputs are usually crafted by fast single-step
adversaries so as to scale to large datasets. This work is mainly focused on
the adversarial training yet efficient FGSM adversary. In this scenario, it is
difficult to train a model with great generalization due to the lack of
representative adversarial samples, aka the samples are unable to accurately
reflect the adversarial domain. To alleviate this problem, we propose a novel
Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to
regard the adversarial training on FGSM adversary as a domain adaption task
with limited number of target domain samples. The main idea is to learn a
representation that is semantically meaningful and domain invariant on the
clean domain as well as the adversarial domain. Empirical evaluations on
Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly
improve the generalization of adversarial training and the smoothness of the
learned models, and outperforms state-of-the-art methods on standard benchmark
datasets. To show the transfer ability of our method, we also extend ATDA to
the adversarial training on iterative attacks such as PGD-Adversial Training
(PAT) and the defense performance is improved considerably.