Despite significant advances, deep networks remain highly susceptible to
adversarial attack. One fundamental challenge is that small input perturbations
can often produce large movements in the network's final-layer feature space.
In this paper, we define an attack model that abstracts this challenge, to help
understand its intrinsic properties. In our model, the adversary may move data
an arbitrary distance in feature space but only in random low-dimensional
subspaces. We prove such adversaries can be quite powerful: defeating any
algorithm that must classify any input it is given. However, by allowing the
algorithm to abstain on unusual inputs, we show such adversaries can be
overcome when classes are reasonably well-separated in feature space. We
further provide strong theoretical guarantees for setting algorithm parameters
to optimize over accuracy-abstention trade-offs using data-driven methods. Our
results provide new robustness guarantees for nearest-neighbor style
algorithms, and also have application to contrastive learning, where we
empirically demonstrate the ability of such algorithms to obtain high robust
accuracy with low abstention rates. Our model is also motivated by strategic
classification, where entities being classified aim to manipulate their
observable features to produce a preferred classification, and we provide new
insights into that area as well.