Adversarial Training Should Be Cast as a Non-Zero-Sum Game

TOP Literature Database Adversarial Training Should Be Cast as a Non-Zero-Sum Game

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2306.11035

PDF

https://arxiv.org/pdf/2306.11035

Paper Information

Author: Alexander Robey;Fabian Latorre;George J. Pappas;Hamed Hassani;Volkan Cevher
Published: 6-20-2023
Updated: 3-19-2024
Affiliation: University of Pennsylvania
Country: United States of America
Conference: International Conference on Learning Representations (ICLR)

Labels Estimated by AI

Adversarial Example Optimization Methods Algorithm

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.