Bayesian Neural Networks (BNNs) are trained to optimize an entire
distribution over their weights instead of a single set, having significant
advantages in terms of, e.g., interpretability, multi-task learning, and
calibration. Because of the intractability of the resulting optimization
problem, most BNNs are either sampled through Monte Carlo methods, or trained
by minimizing a suitable Evidence Lower BOund (ELBO) on a variational
approximation. In this paper, we propose a variant of the latter, wherein we
replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean
Discrepancy (MMD) estimator, inspired by recent work in variational inference.
After motivating our proposal based on the properties of the MMD term, we
proceed to show a number of empirical advantages of the proposed formulation
over the state-of-the-art. In particular, our BNNs achieve higher accuracy on
multiple benchmarks, including several image classification tasks. In addition,
they are more robust to the selection of a prior over the weights, and they are
better calibrated. As a second contribution, we provide a new formulation for
estimating the uncertainty on a given prediction, showing it performs in a more
robust fashion against adversarial attacks and the injection of noise over
their inputs, compared to more classical criteria such as the differential
entropy.