Recent studies show that widely used deep neural networks (DNNs) are
vulnerable to carefully crafted adversarial examples. Many advanced algorithms
have been proposed to generate adversarial examples by leveraging the
$\mathcal{L}_p$ distance for penalizing perturbations. Researchers have
explored different defense methods to defend against such adversarial attacks.
While the effectiveness of $\mathcal{L}_p$ distance as a metric of perceptual
quality remains an active research area, in this paper we will instead focus on
a different type of perturbation, namely spatial transformation, as opposed to
manipulating the pixel values directly as in prior works. Perturbations
generated through spatial transformation could result in large $\mathcal{L}_p$
distance measures, but our extensive experiments show that such spatially
transformed adversarial examples are perceptually realistic and more difficult
to defend against with existing defense systems. This potentially provides a
new direction in adversarial example generation and the design of corresponding
defenses. We visualize the spatial transformation based perturbation for
different examples and show that our technique can produce realistic
adversarial examples with smooth image deformation. Finally, we visualize the
attention of deep networks with different types of adversarial examples to
better understand how these examples are interpreted.