Machine learning classifiers with high test accuracy often perform poorly
under adversarial attacks. It is commonly believed that adversarial training
alleviates this issue. In this paper, we demonstrate that, surprisingly, the
opposite may be true -- Even though adversarial training helps when enough data
is available, it may hurt robust generalization in the small sample size
regime. We first prove this phenomenon for a high-dimensional linear
classification setting with noiseless observations. Our proof provides
explanatory insights that may also transfer to feature learning models.
Further, we observe in experiments on standard image datasets that the same
behavior occurs for perceptible attacks that effectively reduce class
information such as mask attacks and object corruptions.