We present an efficient technique, which allows to train classification
networks which are verifiably robust against norm-bounded adversarial attacks.
This framework is built upon the work of Gowal et al., who applies the interval
arithmetic to bound the activations at each layer and keeps the prediction
invariant to the input perturbation. While that method is faster than
competitive approaches, it requires careful tuning of hyper-parameters and a
large number of epochs to converge. To speed up and stabilize training, we
supply the cost function with an additional term, which encourages the model to
keep the interval bounds at hidden layers small. Experimental results
demonstrate that we can achieve comparable (or even better) results using a
smaller number of training iterations, in a more stable fashion. Moreover, the
proposed model is not so sensitive to the exact specification of the training
process, which makes it easier to use by practitioners.