These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Two key challenges facing modern deep learning are mitigating deep networks'
vulnerability to adversarial attacks and understanding deep learning's
generalization capabilities. Towards the first issue, many defense strategies
have been developed, with the most common being Adversarial Training (AT).
Towards the second challenge, one of the dominant theories that has emerged is
the Neural Tangent Kernel (NTK) -- a characterization of neural network
behavior in the infinite-width limit. In this limit, the kernel is frozen, and
the underlying feature map is fixed. In finite widths, however, there is
evidence that feature learning happens at the earlier stages of the training
(kernel learning) before a second phase where the kernel remains fixed (lazy
training). While prior work has aimed at studying adversarial vulnerability
through the lens of the frozen infinite-width NTK, there is no work that
studies the adversarial robustness of the empirical/finite NTK during training.
In this work, we perform an empirical study of the evolution of the empirical
NTK under standard and adversarial training, aiming to disambiguate the effect
of adversarial training on kernel learning and lazy training. We find under
adversarial training, the empirical NTK rapidly converges to a different kernel
(and feature map) than standard training. This new kernel provides adversarial
robustness, even when non-robust training is performed on top of it.
Furthermore, we find that adversarial training on top of a fixed kernel can
yield a classifier with $76.1\%$ robust accuracy under PGD attacks with
$\varepsilon = 4/255$ on CIFAR-10.