We investigate the topics of sensitivity and robustness in feedforward and
convolutional neural networks. Combining energy landscape techniques developed
in computational chemistry with tools drawn from formal methods, we produce
empirical evidence indicating that networks corresponding to lower-lying minima
in the optimization landscape of the learning objective tend to be more robust.
The robustness estimate used is the inverse of a proposed sensitivity measure,
which we define as the volume of an over-approximation of the reachable set of
network outputs under all additive $l_{\infty}$-bounded perturbations on the
input data. We present a novel loss function which includes a sensitivity term
in addition to the traditional task-oriented and regularization terms. In our
experiments on standard machine learning and computer vision datasets, we show
that the proposed loss function leads to networks which reliably optimize the
robustness measure as well as other related metrics of adversarial robustness
without significant degradation in the classification error. Experimental
results indicate that the proposed method outperforms state-of-the-art
sensitivity-based learning approaches with regards to robustness to adversarial
attacks. We also show that although the introduced framework does not
explicitly enforce an adversarial loss, it achieves competitive overall
performance relative to methods that do.