In this paper we establish rigorous benchmarks for image classifier
robustness. Our first benchmark, ImageNet-C, standardizes and expands the
corruption robustness topic, while showing which classifiers are preferable in
safety-critical applications. Then we propose a new dataset called ImageNet-P
which enables researchers to benchmark a classifier's robustness to common
perturbations. Unlike recent robustness research, this benchmark evaluates
performance on common corruptions and perturbations not worst-case adversarial
perturbations. We find that there are negligible changes in relative corruption
robustness from AlexNet classifiers to ResNet classifiers. Afterward we
discover ways to enhance corruption and perturbation robustness. We even find
that a bypassed adversarial defense provides substantial common perturbation
robustness. Together our benchmarks may aid future work toward networks that
robustly generalize.