Adversarial training (AT) is among the most effective techniques to improve
model robustness by augmenting training data with adversarial examples.
However, most existing AT methods adopt a specific attack to craft adversarial
examples, leading to the unreliable robustness against other unseen attacks.
Besides, a single attack algorithm could be insufficient to explore the space
of perturbations. In this paper, we introduce adversarial distributional
training (ADT), a novel framework for learning robust models. ADT is formulated
as a minimax optimization problem, where the inner maximization aims to learn
an adversarial distribution to characterize the potential adversarial examples
around a natural one under an entropic regularizer, and the outer minimization
aims to train robust models by minimizing the expected loss over the worst-case
adversarial distributions. Through a theoretical analysis, we develop a general
algorithm for solving ADT, and present three approaches for parameterizing the
adversarial distributions, ranging from the typical Gaussian distributions to
the flexible implicit ones. Empirical results on several benchmarks validate
the effectiveness of ADT compared with the state-of-the-art AT methods.