Machine Learning systems are vulnerable to adversarial attacks and will
highly likely produce incorrect outputs under these attacks. There are
white-box and black-box attacks regarding to adversary's access level to the
victim learning algorithm. To defend the learning systems from these attacks,
existing methods in the speech domain focus on modifying input signals and
testing the behaviours of speech recognizers. We, however, formulate the
defense as a classification problem and present a strategy for systematically
generating adversarial example datasets: one for white-box attacks and one for
black-box attacks, containing both adversarial and normal examples. The
white-box attack is a gradient-based method on Baidu DeepSpeech with the
Mozilla Common Voice database while the black-box attack is a gradient-free
method on a deep model-based keyword spotting system with the Google Speech
Command dataset. The generated datasets are used to train a proposed
Convolutional Neural Network (CNN), together with cepstral features, to detect
adversarial examples. Experimental results show that, it is possible to
accurately distinct between adversarial and normal examples for known attacks,
in both single-condition and multi-condition training settings, while the
performance degrades dramatically for unknown attacks. The adversarial datasets
and the source code are made publicly available.