How many perturbations break this model? Evaluating robustness beyond adversarial accuracy

TOP Literature Database How many perturbations break this model? Evaluating robustness beyond adversarial accuracy

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2207.04129

PDF

https://arxiv.org/pdf/2207.04129

Paper Information

Author: Raphael Olivier;Bhiksha Raj
Published: 7-9-2022
Updated: 8-11-2023
Affiliation: Language Technologies Institute, Carnegie Mellon University
Country: United States of America
Conference

Labels Estimated by AI

Defense Method Adversarial Training Model Design and Accuracy

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Robustness to adversarial attacks is typically evaluated with adversarial accuracy. While essential, this metric does not capture all aspects of robustness and in particular leaves out the question of how many perturbations can be found for each point. In this work, we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. We show that sparsity provides valuable insight into neural networks in multiple ways: for instance, it illustrates important differences between current state-of-the-art robust models them that accuracy analysis does not, and suggests approaches for improving their robustness. When applying broken defenses effective against weak attacks but not strong ones, sparsity can discriminate between the totally ineffective and the partially effective defenses. Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.

External Datasets

CIFAR10

ImageNet