Adversarial examples are perturbed inputs that are designed (from a deep
learning network's (DLN) parameter gradients) to mislead the DLN during test
time. Intuitively, constraining the dimensionality of inputs or parameters of a
network reduces the 'space' in which adversarial examples exist. Guided by this
intuition, we demonstrate that discretization greatly improves the robustness
of DLNs against adversarial attacks. Specifically, discretizing the input space
(or allowed pixel levels from 256 values or 8-bit to 4 values or 2-bit)
extensively improves the adversarial robustness of DLNs for a substantial range
of perturbations for minimal loss in test accuracy. Furthermore, we find that
Binary Neural Networks (BNNs) and related variants are intrinsically more
robust than their full precision counterparts in adversarial scenarios.
Combining input discretization with BNNs furthers the robustness even waiving
the need for adversarial training for certain magnitude of perturbation values.
We evaluate the effect of discretization on MNIST, CIFAR10, CIFAR100 and
Imagenet datasets. Across all datasets, we observe maximal adversarial
resistance with 2-bit input discretization that incurs an adversarial accuracy
loss of just ~1-2% as compared to clean test accuracy.