Existing adversarial learning approaches mostly use class labels to generate
adversarial samples that lead to incorrect predictions, which are then used to
augment the training of the model for improved robustness. While some recent
works propose semi-supervised adversarial learning methods that utilize
unlabeled data, they still require class labels. However, do we really need
class labels at all, for adversarially robust training of deep neural networks?
In this paper, we propose a novel adversarial attack for unlabeled data, which
makes the model confuse the instance-level identities of the perturbed data
samples. Further, we present a self-supervised contrastive learning framework
to adversarially train a robust neural network without labeled data, which aims
to maximize the similarity between a random augmentation of a data sample and
its instance-wise adversarial perturbation. We validate our method, Robust
Contrastive Learning (RoCL), on multiple benchmark datasets, on which it
obtains comparable robust accuracy over state-of-the-art supervised adversarial
learning methods, and significantly improved robustness against the black box
and unseen types of attacks. Moreover, with further joint fine-tuning with
supervised adversarial loss, RoCL obtains even higher robust accuracy over
using self-supervised learning alone. Notably, RoCL also demonstrate impressive
results in robust transfer learning.