These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With the rapid advancement and increased use of deep learning models in image
identification, security becomes a major concern to their deployment in
safety-critical systems. Since the accuracy and robustness of deep learning
models are primarily attributed from the purity of the training samples,
therefore the deep learning architectures are often susceptible to adversarial
attacks. Adversarial attacks are often obtained by making subtle perturbations
to normal images, which are mostly imperceptible to humans, but can seriously
confuse the state-of-the-art machine learning models. We propose a framework,
named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples
by using them in an adaptive way and thus improve the classification accuracy
of the target classifier networks that have been attacked. We also show how
using DAEs adaptively instead of using them directly, improves classification
accuracy further and is more robust to the possibility of designing adaptive
attacks to fool them. We demonstrate our results over MNIST, CIFAR-10, ImageNet
dataset and show how our framework (APuDAE) provides comparable and in most
cases better performance to the baseline methods in purifying adversaries. We
also design adaptive attack specifically designed to attack our purifying model
and demonstrate how our defense is robust to that.