Can Implicit Bias Imply Adversarial Robustness?

TOP Literature Database Can Implicit Bias Imply Adversarial Robustness?

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2405.15942

PDF

https://arxiv.org/pdf/2405.15942

Paper Information

Author: Hancheng Min;René Vidal
Published: 5-25-2024
Updated: 6-5-2024
Affiliation: Center for Innovation in Data Engineering and Science (IDEAS), University of Pennsylvania
Country: United States of America
Conference: International Conference on Machine Learning (ICML)

Labels Estimated by AI

Adversarial Training Bias Algorithm

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

The implicit bias of gradient-based training algorithms has been considered mostly beneficial as it leads to trained networks that often generalize well. However, Frei et al. (2023) show that such implicit bias can harm adversarial robustness. Specifically, they show that if the data consists of clusters with small inter-cluster correlation, a shallow (two-layer) ReLU network trained by gradient flow generalizes well, but it is not robust to adversarial attacks of small radius. Moreover, this phenomenon occurs despite the existence of a much more robust classifier that can be explicitly constructed from a shallow network. In this paper, we extend recent analyses of neuron alignment to show that a shallow network with a polynomial ReLU activation (pReLU) trained by gradient flow not only generalizes well but is also robust to adversarial attacks. Our results highlight the importance of the interplay between data structure and architecture design in the implicit bias and robustness of trained networks.

External Datasets

MNIST

Caltech256