Towards Certification of Uncertainty Calibration under Adversarial Attacks

TOP Literature Database Towards Certification of Uncertainty Calibration under Adversarial Attacks

arxiv

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2405.13922

PDF

https://arxiv.org/pdf/2405.13922

Paper Information

Author: Cornelius Emde;Francesco Pinto;Thomas Lukasiewicz;Philip H. S. Torr;Adel Bibi
Published: 5-23-2024
Affiliation: Department of Computer Science, University of Oxford
Country: United Kingdom
Conference: International Conference on Learning Representations (ICLR)

Labels Estimated by AI

Evaluation Method Watermark Evaluation Difficulty Calibration

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, \textit{certification methods} have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. Furthermore, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through \textit{adversarial calibration training}.

External Datasets

CIFAR-10

ImageNet

FashionMNIST

SVHN