Audio processing models based on deep neural networks are susceptible to
adversarial attacks even when the adversarial audio waveform is 99.9% similar
to a benign sample. Given the wide application of DNN-based audio recognition
systems, detecting the presence of adversarial examples is of high practical
relevance. By applying anomalous pattern detection techniques in the activation
space of these models, we show that 2 of the recent and current
state-of-the-art adversarial attacks on audio processing systems systematically
lead to higher-than-expected activation at some subset of nodes and we can
detect these with up to an AUC of 0.98 with no degradation in performance on
benign samples.