Neural networks are known to be vulnerable to adversarial examples. Carefully
chosen perturbations to real images, while imperceptible to humans, induce
misclassification and threaten the reliability of deep learning systems in the
wild. To guard against adversarial examples, we take inspiration from game
theory and cast the problem as a minimax zero-sum game between the adversary
and the model. In general, for such games, the optimal strategy for both
players requires a stochastic policy, also known as a mixed strategy. In this
light, we propose Stochastic Activation Pruning (SAP), a mixed strategy for
adversarial defense. SAP prunes a random subset of activations (preferentially
pruning those with smaller magnitude) and scales up the survivors to
compensate. We can apply SAP to pretrained networks, including adversarially
trained models, without fine-tuning, providing robustness against adversarial
examples. Experiments demonstrate that SAP confers robustness against attacks,
increasing accuracy and preserving calibration.