Today's state-of-the-art image classifiers fail to correctly classify
carefully manipulated adversarial images. In this work, we develop a new,
localized adversarial attack that generates adversarial examples by
imperceptibly altering the backgrounds of normal images. We first use this
attack to highlight the unnecessary sensitivity of neural networks to changes
in the background of an image, then use it as part of a new training technique:
localized adversarial training. By including locally adversarial images in the
training set, we are able to create a classifier that suffers less loss than a
non-adversarially trained counterpart model on both natural and adversarial
inputs. The evaluation of our localized adversarial training algorithm on MNIST
and CIFAR-10 datasets shows decreased accuracy loss on natural images, and
increased robustness against adversarial inputs.