Although much progress has been made towards robust deep learning, a
significant gap in robustness remains between real-world perturbations and more
narrowly defined sets typically studied in adversarial defenses. In this paper,
we aim to bridge this gap by learning perturbation sets from data, in order to
characterize real-world effects for robust training and evaluation.
Specifically, we use a conditional generator that defines the perturbation set
over a constrained region of the latent space. We formulate desirable
properties that measure the quality of a learned perturbation set, and
theoretically prove that a conditional variational autoencoder naturally
satisfies these criteria. Using this framework, our approach can generate a
variety of perturbations at different complexities and scales, ranging from
baseline spatial transformations, through common image corruptions, to lighting
variations. We measure the quality of our learned perturbation sets both
quantitatively and qualitatively, finding that our models are capable of
producing a diverse set of meaningful perturbations beyond the limited data
seen during training. Finally, we leverage our learned perturbation sets to
train models which are empirically and certifiably robust to adversarial image
corruptions and adversarial lighting variations, while improving generalization
on non-adversarial data. All code and configuration files for reproducing the
experiments as well as pretrained model weights can be found at
https://github.com/locuslab/perturbation_learning.