Neural networks are vulnerable to adversarial examples, i.e. inputs that are
imperceptibly perturbed from natural data and yet incorrectly classified by the
network. Adversarial training, a heuristic form of robust optimization that
alternates between minimization and maximization steps, has proven to be among
the most successful methods to train networks to be robust against a
pre-defined family of perturbations. This paper provides a partial answer to
the success of adversarial training, by showing that it converges to a network
where the surrogate loss with respect to the the attack algorithm is within
$\epsilon$ of the optimal robust loss. Then we show that the optimal robust
loss is also close to zero, hence adversarial training finds a robust
classifier. The analysis technique leverages recent work on the analysis of
neural networks via Neural Tangent Kernel (NTK), combined with motivation from
online-learning when the maximization is solved by a heuristic, and the
expressiveness of the NTK kernel in the $\ell_\infty$-norm. In addition, we
also prove that robust interpolation requires more model capacity, supporting
the evidence that adversarial training requires wider networks.