These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Adversarial training is by far the most successful strategy for improving
robustness of neural networks to adversarial attacks. Despite its success as a
defense mechanism, adversarial training fails to generalize well to unperturbed
test set. We hypothesize that this poor generalization is a consequence of
adversarial training with uniform perturbation radius around every training
sample. Samples close to decision boundary can be morphed into a different
class under a small perturbation budget, and enforcing large margins around
these samples produce poor decision boundaries that generalize poorly.
Motivated by this hypothesis, we propose instance adaptive adversarial training
-- a technique that enforces sample-specific perturbation margins around every
training sample. We show that using our approach, test accuracy on unperturbed
samples improve with a marginal drop in robustness. Extensive experiments on
CIFAR-10, CIFAR-100 and Imagenet datasets demonstrate the effectiveness of our
proposed approach.