Adversarial training, in which a network is trained on both adversarial and
clean examples, is one of the most trusted defense methods against adversarial
attacks. However, there are three major practical difficulties in implementing
and deploying this method - expensive in terms of extra memory and computation
costs; accuracy trade-off between clean and adversarial examples; and lack of
diversity of adversarial perturbations. Classical adversarial training uses
fixed, precomputed perturbations in adversarial examples (input space). In
contrast, we introduce dynamic adversarial perturbations into the parameter
space of the network, by adding perturbation biases to the fully connected
layers of deep convolutional neural network. During training, using only clean
images, the perturbation biases are updated in the Fast Gradient Sign Direction
to automatically create and store adversarial perturbations by recycling the
gradient information computed. The network learns and adjusts itself
automatically to these learned adversarial perturbations. Thus, we can achieve
adversarial training with negligible cost compared to requiring a training set
of adversarial example images. In addition, if combined with classical
adversarial training, our perturbation biases can alleviate accuracy trade-off
difficulties, and diversify adversarial perturbations.