Various adversarial audio attacks have recently been developed to fool
automatic speech recognition (ASR) systems. We here propose a defense against
such attacks based on the uncertainty introduced by dropout in neural networks.
We show that our defense is able to detect attacks created through optimized
perturbations and frequency masking on a state-of-the-art end-to-end ASR
system. Furthermore, the defense can be made robust against attacks that are
immune to noise reduction. We test our defense on Mozilla's CommonVoice
dataset, the UrbanSound dataset, and an excerpt of the LibriSpeech dataset,
showing that it achieves high detection accuracy in a wide range of scenarios.