Despite the remarkable performance and generalization levels of deep learning
models in a wide range of artificial intelligence tasks, it has been
demonstrated that these models can be easily fooled by the addition of
imperceptible yet malicious perturbations to natural inputs. These altered
inputs are known in the literature as adversarial examples. In this paper, we
propose a novel probabilistic framework to generalize and extend adversarial
attacks in order to produce a desired probability distribution for the classes
when we apply the attack method to a large number of inputs. This novel attack
paradigm provides the adversary with greater control over the target model,
thereby exposing, in a wide range of scenarios, threats against deep learning
models that cannot be conducted by the conventional paradigms. We introduce
four different strategies to efficiently generate such attacks, and illustrate
our approach by extending multiple adversarial attack algorithms. We also
experimentally validate our approach for the spoken command classification task
and the Tweet emotion classification task, two exemplary machine learning
problems in the audio and text domain, respectively. Our results demonstrate
that we can closely approximate any probability distribution for the classes
while maintaining a high fooling rate and even prevent the attacks from being
detected by label-shift detection methods.