An acknowledged weakness of neural networks is their vulnerability to
adversarial perturbations to the inputs. To improve the robustness of these
models, one of the most popular defense mechanisms is to alternatively maximize
the loss over the constrained perturbations (or called adversaries) on the
inputs using projected gradient ascent and minimize over weights. In this
paper, we analyze the dynamics of the maximization step towards understanding
the experimentally observed effectiveness of this defense mechanism.
Specifically, we investigate the non-concave landscape of the adversaries for a
two-layer neural network with a quadratic loss. Our main result proves that
projected gradient ascent finds a local maximum of this non-concave problem in
a polynomial number of iterations with high probability. To our knowledge, this
is the first work that provides a convergence analysis of the first-order
adversaries. Moreover, our analysis demonstrates that, in the initial phase of
adversarial training, the scale of the inputs matters in the sense that a
smaller input scale leads to faster convergence of adversarial training and a
"more regular" landscape. Finally, we show that these theoretical findings are
in excellent agreement with a series of experiments.