Deep neural networks (DNNs) are known vulnerable to adversarial attacks. That
is, adversarial examples, obtained by adding delicately crafted distortions
onto original legal inputs, can mislead a DNN to classify them as any target
labels. This work provides a solution to hardening DNNs under adversarial
attacks through defensive dropout. Besides using dropout during training for
the best test accuracy, we propose to use dropout also at test time to achieve
strong defense effects. We consider the problem of building robust DNNs as an
attacker-defender two-player game, where the attacker and the defender know
each others' strategies and try to optimize their own strategies towards an
equilibrium. Based on the observations of the effect of test dropout rate on
test accuracy and attack success rate, we propose a defensive dropout algorithm
to determine an optimal test dropout rate given the neural network model and
the attacker's strategy for generating adversarial examples.We also investigate
the mechanism behind the outstanding defense effects achieved by the proposed
defensive dropout. Comparing with stochastic activation pruning (SAP), another
defense method through introducing randomness into the DNN model, we find that
our defensive dropout achieves much larger variances of the gradients, which is
the key for the improved defense effects (much lower attack success rate). For
example, our defensive dropout can reduce the attack success rate from 100% to
13.89% under the currently strongest attack i.e., C&W attack on MNIST dataset.