Smoothing classifiers and probability density functions with Gaussian kernels
appear unrelated, but in this work, they are unified for the problem of robust
classification. The key building block is approximating the $\textit{energy
function}$ of the random variable $Y=X+N(0,\sigma^2 I_d)$ with a neural network
which we use to formulate the problem of robust classification in terms of
$\widehat{x}(Y)$, the $\textit{Bayes estimator}$ of $X$ given the noisy
measurements $Y$. We introduce $\textit{empirical Bayes smoothed classifiers}$
within the framework of $\textit{randomized smoothing}$ and study it
theoretically for the two-class linear classifier, where we show one can
improve their robustness above $\textit{the margin}$. We test the theory on
MNIST and we show that with a learned smoothed energy function and a linear
classifier we can achieve provable $\ell_2$ robust accuracies that are
competitive with empirical defenses. This setup can be significantly improved
by $\textit{learning}$ empirical Bayes smoothed classifiers with adversarial
training and on MNIST we show that we can achieve provable robust accuracies
higher than the state-of-the-art empirical defenses in a range of radii. We
discuss some fundamental challenges of randomized smoothing based on a
geometric interpretation due to concentration of Gaussians in high dimensions,
and we finish the paper with a proposal for using walk-jump sampling, itself
based on learned smoothed densities, for robust classification.