Recently demonstrated physical-world adversarial attacks have exposed
vulnerabilities in perception systems that pose severe risks for
safety-critical applications such as autonomous driving. These attacks place
adversarial artifacts in the physical world that indirectly cause the addition
of a universal patch to inputs of a model that can fool it in a variety of
contexts. Adversarial training is the most effective defense against
image-dependent adversarial attacks. However, tailoring adversarial training to
universal patches is computationally expensive since the optimal universal
patch depends on the model weights which change during training. We propose
meta adversarial training (MAT), a novel combination of adversarial training
with meta-learning, which overcomes this challenge by meta-learning universal
patches along with model training. MAT requires little extra computation while
continuously adapting a large set of patches to the current model. MAT
considerably increases robustness against universal patch attacks on image
classification and traffic-light detection.