In the past few years, it has been shown that deep learning systems are
highly vulnerable under attacks with adversarial examples. Neural-network-based
automatic speech recognition (ASR) systems are no exception. Targeted and
untargeted attacks can modify an audio input signal in such a way that humans
still recognise the same words, while ASR systems are steered to predict a
different transcription. In this paper, we propose a defense mechanism against
targeted adversarial attacks consisting in removing fast-changing features from
the audio signals, either by applying slow feature analysis, a low-pass filter,
or both, before feeding the input to the ASR system. We perform an empirical
analysis of hybrid ASR models trained on data pre-processed in such a way.
While the resulting models perform quite well on benign data, they are
significantly more robust against targeted adversarial attacks: Our final,
proposed model shows a performance on clean data similar to the baseline model,
while being more than four times more robust.