Much research effort has been devoted to better understanding adversarial
examples, which are specially crafted inputs to machine-learning models that
are perceptually similar to benign inputs, but are classified differently
(i.e., misclassified). Both algorithms that create adversarial examples and
strategies for defending against them typically use $L_p$-norms to measure the
perceptual similarity between an adversarial input and its benign original.
Prior work has already shown, however, that two images need not be close to
each other as measured by an $L_p$-norm to be perceptually similar. In this
work, we show that nearness according to an $L_p$-norm is not just unnecessary
for perceptual similarity, but is also insufficient. Specifically, focusing on
datasets (CIFAR10 and MNIST), $L_p$-norms, and thresholds used in prior work,
we show through online user studies that "adversarial examples" that are closer
to their benign counterparts than required by commonly used $L_p$-norm
thresholds can nevertheless be perceptually different to humans from the
corresponding benign examples. Namely, the perceptual distance between two
images that are "near" each other according to an $L_p$-norm can be high enough
that participants frequently classify the two images as representing different
objects or digits. Combined with prior work, we thus demonstrate that nearness
of inputs as measured by $L_p$-norms is neither necessary nor sufficient for
perceptual similarity, which has implications for both creating and defending
against adversarial examples. We propose and discuss alternative similarity
metrics to stimulate future research in the area.