We present a new algorithm to train a robust neural network against
adversarial attacks. Our algorithm is motivated by the following two ideas.
First, although recent work has demonstrated that fusing randomness can improve
the robustness of neural networks (Liu 2017), we noticed that adding noise
blindly to all the layers is not the optimal way to incorporate randomness.
Instead, we model randomness under the framework of Bayesian Neural Network
(BNN) to formally learn the posterior distribution of models in a scalable way.
Second, we formulate the mini-max problem in BNN to learn the best model
distribution under adversarial attacks, leading to an adversarial-trained
Bayesian neural net. Experiment results demonstrate that the proposed algorithm
achieves state-of-the-art performance under strong attacks. On CIFAR-10 with
VGG network, our model leads to 14\% accuracy improvement compared with
adversarial training (Madry 2017) and random self-ensemble (Liu 2017) under PGD
attack with $0.035$ distortion, and the gap becomes even larger on a subset of
ImageNet.