These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning models are vulnerable to adversarial examples. Iterative
adversarial training has shown promising results against strong white-box
attacks. However, adversarial training is very expensive, and every time a
model needs to be protected, such expensive training scheme needs to be
performed. In this paper, we propose to apply iterative adversarial training
scheme to an external auto-encoder, which once trained can be used to protect
other models directly. We empirically show that our model outperforms other
purifying-based methods against white-box attacks, and transfers well to
directly protect other base models with different architectures.