Understanding Catastrophic Overfitting in Single-step Adversarial Training

Authors: Hoki Kim, Woojin Lee, Jaewook Lee | Published: 2020-10-05 | Updated: 2020-12-15

2020.10.052025.05.28

Authors: Hoki Kim, Woojin Lee, Jaewook Lee
Published: 2020-10-05 | Updated: 2020-12-15

Source: https://arxiv.org/abs/2010.01799

PDF: https://arxiv.org/pdf/2010.01799

Labels Predicted by AI

Adversarial Learning Poisoning Robustness Evaluation

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Although fast adversarial training has demonstrated both robustness and efficiency, the problem of “catastrophic overfitting” has been observed. This is a phenomenon in which, during single-step adversarial training, the robust accuracy against projected gradient descent (PGD) suddenly decreases to 0 after a few epochs, whereas the robust accuracy against fast gradient sign method (FGSM) increases to 100 catastrophic overfitting is very closely related to the characteristic of single-step adversarial training which uses only adversarial examples with the maximum perturbation, and not all adversarial examples in the adversarial direction, which leads to decision boundary distortion and a highly curved loss surface. Based on this observation, we propose a simple method that not only prevents catastrophic overfitting, but also overrides the belief that it is difficult to prevent multi-step adversarial attacks with single-step adversarial training.