The fragility of deep neural networks to adversarially-chosen inputs has
motivated the need to revisit deep learning algorithms. Including adversarial
examples during training is a popular defense mechanism against adversarial
attacks. This mechanism can be formulated as a min-max optimization problem,
where the adversary seeks to maximize the loss function using an iterative
first-order algorithm while the learner attempts to minimize it. However,
finding adversarial examples in this way causes excessive computational
overhead during training. By interpreting the min-max problem as an optimal
control problem, it has recently been shown that one can exploit the
compositional structure of neural networks in the optimization problem to
improve the training time significantly. In this paper, we provide the first
convergence analysis of this adversarial training algorithm by combining
techniques from robust optimal control and inexact oracle methods in
optimization. Our analysis sheds light on how the hyperparameters of the
algorithm affect the its stability and convergence. We support our insights
with experiments on a robust classification problem.