Recently, it has been shown that deep neural networks (DNN) are subject to
attacks through adversarial samples. Adversarial samples are often crafted
through adversarial perturbation, i.e., manipulating the original sample with
minor modifications so that the DNN model labels the sample incorrectly. Given
that it is almost impossible to train perfect DNN, adversarial samples are
shown to be easy to generate. As DNN are increasingly used in safety-critical
systems like autonomous cars, it is crucial to develop techniques for defending
such attacks. Existing defense mechanisms which aim to make adversarial
perturbation challenging have been shown to be ineffective. In this work, we
propose an alternative approach. We first observe that adversarial samples are
much more sensitive to perturbations than normal samples. That is, if we impose
random perturbations on a normal and an adversarial sample respectively, there
is a significant difference between the ratio of label change due to the
perturbations. Observing this, we design a statistical adversary detection
algorithm called nMutant (inspired by mutation testing from software
engineering community). Our experiments show that nMutant effectively detects
most of the adversarial samples generated by recently proposed attacking
methods. Furthermore, we provide an error bound with certain statistical
significance along with the detection.