VenoMave: Targeted Poisoning Against Speech Recognition

TOP Literature Database VenoMave: Targeted Poisoning Against Speech Recognition

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2010.10682

PDF

https://arxiv.org/pdf/2010.10682

Paper Information

Author: Hojjat Aghakhani;Lea Schönherr;Thorsten Eisenhofer;Dorothea Kolossa;Thorsten Holz;Christopher Kruegel;Giovanni Vigna
Published: 10-21-2020
Updated: 4-21-2023
Affiliation: University of California, Santa Barbara
Country: United States of America
Conference: Conference on Secure and Trustworthy Machine Learning (SaTML)

Labels Estimated by AI

Poisoning Poisoning Attack Backdoor Attack

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Despite remarkable improvements, automatic speech recognition is susceptible to adversarial perturbations. Compared to standard machine learning architectures, these attacks are significantly more challenging, especially since the inputs to a speech recognition system are time series that contain both acoustic and linguistic properties of speech. Extracting all recognition-relevant information requires more complex pipelines and an ensemble of specialized components. Consequently, an attacker needs to consider the entire pipeline. In this paper, we present VENOMAVE, the first training-time poisoning attack against speech recognition. Similar to the predominantly studied evasion attacks, we pursue the same goal: leading the system to an incorrect and attacker-chosen transcription of a target audio waveform. In contrast to evasion attacks, however, we assume that the attacker can only manipulate a small part of the training data without altering the target audio waveform at runtime. We evaluate our attack on two datasets: TIDIGITS and Speech Commands. When poisoning less than 0.17% of the dataset, VENOMAVE achieves attack success rates of more than 80.0%, without access to the victim's network architecture or hyperparameters. In a more realistic scenario, when the target audio waveform is played over the air in different rooms, VENOMAVE maintains a success rate of up to 73.3%. Finally, VENOMAVE achieves an attack transferability rate of 36.4% between two different model architectures.

External Datasets

TIDIGITS

Speech Commands