Despite the growing prevalence of artificial neural networks in real-world
applications, their vulnerability to adversarial attacks remains a significant
concern, which motivates us to investigate the robustness of machine learning
models. While various heuristics aim to optimize the distributionally robust
risk using the $\infty$-Wasserstein metric, such a notion of robustness
frequently encounters computation intractability. To tackle the computational
challenge, we develop a novel approach to adversarial training that integrates
$\phi$-divergence regularization into the distributionally robust risk
function. This regularization brings a notable improvement in computation
compared with the original formulation. We develop stochastic gradient methods
with biased oracles to solve this problem efficiently, achieving the
near-optimal sample complexity. Moreover, we establish its regularization
effects and demonstrate it is asymptotic equivalence to a regularized empirical
risk minimization framework, by considering various scaling regimes of the
regularization parameter and robustness level. These regimes yield gradient
norm regularization, variance regularization, or a smoothed gradient norm
regularization that interpolates between these extremes. We numerically
validate our proposed method in supervised learning, reinforcement learning,
and contextual learning and showcase its state-of-the-art performance against
various adversarial attacks.