These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Modern image classification systems are often built on deep neural networks,
which suffer from adversarial examples--images with deliberately crafted,
imperceptible noise to mislead the network's classification. To defend against
adversarial examples, a plausible idea is to obfuscate the network's gradient
with respect to the input image. This general idea has inspired a long line of
defense methods. Yet, almost all of them have proven vulnerable. We revisit
this seemingly flawed idea from a radically different perspective. We embrace
the omnipresence of adversarial examples and the numerical procedure of
crafting them, and turn this harmful attacking process into a useful defense
mechanism. Our defense method is conceptually simple: before feeding an input
image for classification, transform it by finding an adversarial example on a
pre-trained external model. We evaluate our method against a wide range of
possible attacks. On both CIFAR-10 and Tiny ImageNet datasets, our method is
significantly more robust than state-of-the-art methods. Particularly, in
comparison to adversarial training, our method offers lower training cost as
well as stronger robustness.