We propose a novel adaptive empirical Bayesian method for sparse deep
learning, where the sparsity is ensured via a class of self-adaptive
spike-and-slab priors. The proposed method works by alternatively sampling from
an adaptive hierarchical posterior distribution using stochastic gradient
Markov Chain Monte Carlo (MCMC) and smoothly optimizing the hyperparameters
using stochastic approximation (SA). We further prove the convergence of the
proposed method to the asymptotically correct distribution under mild
conditions. Empirical applications of the proposed method lead to the
state-of-the-art performance on MNIST and Fashion MNIST with shallow
convolutional neural networks and the state-of-the-art compression performance
on CIFAR10 with Residual Networks. The proposed method also improves resistance
to adversarial attacks.