These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
The existence of adversarial attacks on machine learning models imperceptible
to a human is still quite a mystery from a theoretical perspective. In this
work, we introduce two notions of adversarial attacks: natural or on-manifold
attacks, which are perceptible by a human/oracle, and unnatural or off-manifold
attacks, which are not. We argue that the existence of the off-manifold attacks
is a natural consequence of the dimension gap between the intrinsic and ambient
dimensions of the data. For 2-layer ReLU networks, we prove that even though
the dimension gap does not affect generalization performance on samples drawn
from the observed data space, it makes the clean-trained model more vulnerable
to adversarial perturbations in the off-manifold direction of the data space.
Our main results provide an explicit relationship between the
$\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the
dimension gap.