In many real-world applications of Machine Learning it is of paramount
importance not only to provide accurate predictions, but also to ensure certain
levels of robustness. Adversarial Training is a training procedure aiming at
providing models that are robust to worst-case perturbations around predefined
points. Unfortunately, one of the main issues in adversarial training is that
robustness w.r.t. gradient-based attackers is always achieved at the cost of
prediction accuracy. In this paper, a new algorithm, called Wasserstein
Projected Gradient Descent (WPGD), for adversarial training is proposed. WPGD
provides a simple way to obtain cost-sensitive robustness, resulting in a finer
control of the robustness-accuracy trade-off. Moreover, WPGD solves an optimal
transport problem on the output space of the network and it can efficiently
discover directions where robustness is required, allowing to control the
directional trade-off between accuracy and robustness. The proposed WPGD is
validated in this work on image recognition tasks with different benchmark
datasets and architectures. Moreover, real world-like datasets are often
unbalanced: this paper shows that when dealing with such type of datasets, the
performance of adversarial training are mainly affected in term of standard
accuracy.