Machine learning algorithms are vulnerable to poisoning attacks: An adversary
can inject malicious points in the training dataset to influence the learning
process and degrade the algorithm's performance. Optimal poisoning attacks have
already been proposed to evaluate worst-case scenarios, modelling attacks as a
bi-level optimization problem. Solving these problems is computationally
demanding and has limited applicability for some models such as deep networks.
In this paper we introduce a novel generative model to craft systematic
poisoning attacks against machine learning classifiers generating adversarial
training examples, i.e. samples that look like genuine data points but that
degrade the classifier's accuracy when used for training. We propose a
Generative Adversarial Net with three components: generator, discriminator, and
the target classifier. This approach allows us to model naturally the
detectability constrains that can be expected in realistic attacks and to
identify the regions of the underlying data distribution that can be more
vulnerable to data poisoning. Our experimental evaluation shows the
effectiveness of our attack to compromise machine learning classifiers,
including deep networks.