In recent years, Deep Neural Networks (DNNs) have had a dramatic impact on a
variety of problems that were long considered very difficult, e. g., image
classification and automatic language translation to name just a few. The
accuracy of modern DNNs in classification tasks is remarkable indeed. At the
same time, attackers have devised powerful methods to construct
specially-crafted malicious inputs (often referred to as adversarial examples)
that can trick DNNs into mis-classifying them. What is worse is that despite
the many defense mechanisms proposed to protect DNNs against adversarial
attacks, attackers are often able to circumvent these defenses, rendering them
useless. This state of affairs is extremely worrying, especially since machine
learning systems get adopted at scale.
In this paper, we propose a scientific evaluation methodology aimed at
assessing the quality, efficacy, robustness and efficiency of randomized
defenses to protect DNNs against adversarial examples. Using this methodology,
we evaluate a variety of defense mechanisms. In addition, we also propose a
defense mechanism we call Randomly Perturbed Ensemble Neural Networks (RPENNs).
We provide a thorough and comprehensive evaluation of the considered defense
mechanisms against a white-box attacker model, six different adversarial attack
methods and using the ILSVRC2012 validation data set.