Deep neural networks have been demonstrated to be vulnerable to adversarial
attacks, where small perturbations intentionally added to the original inputs
can fool the classifier. In this paper, we propose a defense method, Featurized
Bidirectional Generative Adversarial Networks (FBGAN), to extract the semantic
features of the input and filter the non-semantic perturbation. FBGAN is
pre-trained on the clean dataset in an unsupervised manner, adversarially
learning a bidirectional mapping between the high-dimensional data space and
the low-dimensional semantic space; also mutual information is applied to
disentangle the semantically meaningful features. After the bidirectional
mapping, the adversarial data can be reconstructed to denoised data, which
could be fed into any pre-trained classifier. We empirically show the quality
of reconstruction images and the effectiveness of defense.