Understanding Catastrophic Overfitting in Single-step Adversarial Training

TOP Literature Database Understanding Catastrophic Overfitting in Single-step Adversarial Training

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2010.01799

PDF

https://arxiv.org/pdf/2010.01799

Paper Information

Author: Hoki Kim,Woojin Lee,Jaewook Lee
Published: 10-5-2020
Updated: 12-15-2020
Affiliation: Seoul National University
Country: Korea
Conference: AAAI Conference on Artificial Intelligence (AAAI)

Labels Estimated by AI

Adversarial Learning Poisoning Robustness Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Although fast adversarial training has demonstrated both robustness and efficiency, the problem of "catastrophic overfitting" has been observed. This is a phenomenon in which, during single-step adversarial training, the robust accuracy against projected gradient descent (PGD) suddenly decreases to 0% after a few epochs, whereas the robust accuracy against fast gradient sign method (FGSM) increases to 100%. In this paper, we demonstrate that catastrophic overfitting is very closely related to the characteristic of single-step adversarial training which uses only adversarial examples with the maximum perturbation, and not all adversarial examples in the adversarial direction, which leads to decision boundary distortion and a highly curved loss surface. Based on this observation, we propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.