Neural networks with low-precision weights and activations offer compelling
efficiency advantages over their full-precision equivalents. The two most
frequently discussed benefits of quantization are reduced memory consumption,
and a faster forward pass when implemented with efficient bitwise operations.
We propose a third benefit of very low-precision neural networks: improved
robustness against some adversarial attacks, and in the worst case, performance
that is on par with full-precision models. We focus on the very low-precision
case where weights and activations are both quantized to $\pm$1, and note that
stochastically quantizing weights in just one layer can sharply reduce the
impact of iterative attacks. We observe that non-scaled binary neural networks
exhibit a similar effect to the original defensive distillation procedure that
led to gradient masking, and a false notion of security. We address this by
conducting both black-box and white-box experiments with binary models that do
not artificially mask gradients.