Deep learning models are shown to be vulnerable to adversarial examples.
Though adversarial training can enhance model robustness, typical approaches
are computationally expensive. Recent works proposed to transfer the robustness
to adversarial attacks across different tasks or models with soft
labels.Compared to soft labels, feature contains rich semantic information and
holds the potential to be applied to different downstream tasks. In this paper,
we propose a novel approach called Guided Adversarial Contrastive Distillation
(GACD), to effectively transfer adversarial robustness from teacher to student
with features. We first formulate this objective as contrastive learning and
connect it with mutual information. With a well-trained teacher model as an
anchor, students are expected to extract features similar to the teacher. Then
considering the potential errors made by teachers, we propose sample reweighted
estimation to eliminate the negative effects from teachers. With GACD, the
student not only learns to extract robust features, but also captures
structural knowledge from the teacher. By extensive experiments evaluating over
popular datasets such as CIFAR-10, CIFAR-100 and STL-10, we demonstrate that
our approach can effectively transfer robustness across different models and
even different tasks, and achieve comparable or better results than existing
methods. Besides, we provide a detailed analysis of various methods, showing
that students produced by our approach capture more structural knowledge from
teachers and learn more robust features under adversarial attacks.