Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder

TOP 文献データベース Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1905.09027

PDF

https://arxiv.org/pdf/1905.09027

文献情報

作者: Ji Feng,Qi-Zhi Cai,Zhi-Hua Zhou
公開日: 2019-5-22
所属機関: National Key Laboratory for Novel Software Technology, Nanjing University
所属の国: China
会議名: Conference on Neural Information Processing Systems (NeurIPS)

AIにより推定されたラベル

ポイズニング効果的な摂動手法機械学習アルゴリズム

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

In this work, we consider one challenging training time attack by modifying training data with bounded perturbation, hoping to manipulate the behavior (both targeted or non-targeted) of any corresponding trained classifier during test time when facing clean samples. To achieve this, we proposed to use an auto-encoder-like network to generate the pertubation on the training data paired with one differentiable system acting as the imaginary victim classifier. The perturbation generator will learn to update its weights by watching the training procedure of the imaginary classifier in order to produce the most harmful and imperceivable noise which in turn will lead the lowest generalization power for the victim classifier. This can be formulated into a non-linear equality constrained optimization problem. Unlike GANs, solving such problem is computationally challenging, we then proposed a simple yet effective procedure to decouple the alternating updates for the two networks for stability. The method proposed in this paper can be easily extended to the label specific setting where the attacker can manipulate the predictions of the victim classifiers according to some predefined rules rather than only making wrong predictions. Experiments on various datasets including CIFAR-10 and a reduced version of ImageNet confirmed the effectiveness of the proposed method and empirical results showed that, such bounded perturbation have good transferability regardless of which classifier the victim is actually using on image data.