In this paper, we develop improved techniques for defending against
adversarial examples at scale. First, we implement the state of the art version
of adversarial training at unprecedented scale on ImageNet and investigate
whether it remains effective in this setting - an important open scientific
question (Athalye et al., 2018). Next, we introduce enhanced defenses using a
technique we call logit pairing, a method that encourages logits for pairs of
examples to be similar. When applied to clean examples and their adversarial
counterparts, logit pairing improves accuracy on adversarial examples over
vanilla adversarial training; we also find that logit pairing on clean examples
only is competitive with adversarial training in terms of accuracy on two
datasets. Finally, we show that adversarial logit pairing achieves the state of
the art defense on ImageNet against PGD white box attacks, with an accuracy
improvement from 1.5% to 27.9%. Adversarial logit pairing also successfully
damages the current state of the art defense against black box attacks on
ImageNet (Tramer et al., 2018), dropping its accuracy from 66.6% to 47.1%. With
this new accuracy drop, adversarial logit pairing ties with Tramer et al.(2018)
for the state of the art on black box attacks on ImageNet.