The application of deep recurrent networks to audio transcription has led to
impressive gains in automatic speech recognition (ASR) systems. Many have
demonstrated that small adversarial perturbations can fool deep neural networks
into incorrectly predicting a specified target with high confidence. Current
work on fooling ASR systems have focused on white-box attacks, in which the
model architecture and parameters are known. In this paper, we adopt a
black-box approach to adversarial generation, combining the approaches of both
genetic algorithms and gradient estimation to solve the task. We achieve a
89.25% targeted attack similarity after 3000 generations while maintaining
94.6% audio file similarity.