These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The ubiquitous presence of machine learning systems in our lives necessitates
research into their vulnerabilities and appropriate countermeasures. In
particular, we investigate the effectiveness of adversarial attacks and
defenses against automatic speech recognition (ASR) systems. We select two ASR
models - a thoroughly studied DeepSpeech model and a more recent Espresso
framework Transformer encoder-decoder model. We investigate two threat models:
a denial-of-service scenario where fast gradient-sign method (FGSM) or weak
projected gradient descent (PGD) attacks are used to degrade the model's word
error rate (WER); and a targeted scenario where a more potent imperceptible
attack forces the system to recognize a specific phrase. We find that the
attack transferability across the investigated ASR systems is limited. To
defend the model, we use two preprocessing defenses: randomized smoothing and
WaveGAN-based vocoder, and find that they significantly improve the model's
adversarial robustness. We show that a WaveGAN vocoder can be a useful
countermeasure to adversarial attacks on ASR systems - even when it is jointly
attacked with the ASR, the target phrases' word error rate is high.