As the use of Voice Processing Systems (VPS) continues to become more
prevalent in our daily lives through the increased reliance on applications
such as commercial voice recognition devices as well as major text-to-speech
software, the attacks on these systems are increasingly complex, varied, and
constantly evolving. With the use cases for VPS rapidly growing into new spaces
and purposes, the potential consequences regarding privacy are increasingly
more dangerous. In addition, the growing number and increased practicality of
over-the-air attacks have made system failures much more probable. In this
paper, we will identify and classify an arrangement of unique attacks on voice
processing systems. Over the years research has been moving from specialized,
untargeted attacks that result in the malfunction of systems and the denial of
services to more general, targeted attacks that can force an outcome controlled
by an adversary. The current and most frequently used machine learning systems
and deep neural networks, which are at the core of modern voice processing
systems, were built with a focus on performance and scalability rather than
security. Therefore, it is critical for us to reassess the developing voice
processing landscape and to identify the state of current attacks and defenses
so that we may suggest future developments and theoretical improvements.