Deep learning has undoubtedly offered tremendous improvements in the
performance of state-of-the-art speech emotion recognition (SER) systems.
However, recent research on adversarial examples poses enormous challenges on
the robustness of SER systems by showing the susceptibility of deep neural
networks to adversarial examples as they rely only on small and imperceptible
perturbations. In this study, we evaluate how adversarial examples can be used
to attack SER systems and propose the first black-box adversarial attack on SER
systems. We also explore potential defenses including adversarial training and
generative adversarial network (GAN) to enhance robustness. Experimental
evaluations suggest various interesting aspects of the effective utilization of
adversarial examples useful for achieving robustness for SER systems opening up
opportunities for researchers to further innovate in this space.