Purifying Adversarial Perturbation with Adversarially Trained Auto-encoders

Authors: Hebi Li, Qi Xiao, Shixin Tian, Jin Tian
Published: 2019-05-26

Source: https://arxiv.org/abs/1905.10729

PDF: https://arxiv.org/pdf/1905.10729

Labels Predicted by AI

Vulnerability of Adversarial Examples Machine Learning Method Attack Type

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Machine learning models are vulnerable to adversarial examples. Iterative adversarial training has shown promising results against strong white-box attacks. However, adversarial training is very expensive, and every time a model needs to be protected, such expensive training scheme needs to be performed. In this paper, we propose to apply iterative adversarial training scheme to an external auto-encoder, which once trained can be used to protect other models directly. We empirically show that our model outperforms other purifying-based methods against white-box attacks, and transfers well to directly protect other base models with different architectures.