In this work, we consider one challenging training time attack by modifying
training data with bounded perturbation, hoping to manipulate the behavior
(both targeted or non-targeted) of any corresponding trained classifier during
test time when facing clean samples. To achieve this, we proposed to use an
auto-encoder-like network to generate the pertubation on the training data
paired with one differentiable system acting as the imaginary victim
classifier. The perturbation generator will learn to update its weights by
watching the training procedure of the imaginary classifier in order to produce
the most harmful and imperceivable noise which in turn will lead the lowest
generalization power for the victim classifier. This can be formulated into a
non-linear equality constrained optimization problem. Unlike GANs, solving such
problem is computationally challenging, we then proposed a simple yet effective
procedure to decouple the alternating updates for the two networks for
stability. The method proposed in this paper can be easily extended to the
label specific setting where the attacker can manipulate the predictions of the
victim classifiers according to some predefined rules rather than only making
wrong predictions. Experiments on various datasets including CIFAR-10 and a
reduced version of ImageNet confirmed the effectiveness of the proposed method
and empirical results showed that, such bounded perturbation have good
transferability regardless of which classifier the victim is actually using on
image data.