Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks | AI Security Portal

JA

JA

EN

TOP Literature Database Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

arxiv

Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

AI Security Portal bot

Information in the literature database is collected automatically.

Source

https://arxiv.org/abs/2404.19640

PDF

https://arxiv.org/pdf/2404.19640

Paper Information

Author: Yunzhen Feng;Tim G. J. Rudner;Nikolaos Tsilivis;Julia Kempe
Published: 4-27-2024
Affiliation: New York University
Country: United States of America
Conference: Trans. Mach. Learn. Res.

Labels Estimated by AI

Adversarial Example Quantification of Uncertainty Watermark Evaluation

These labels were automatically added by AI and may be inaccurate.
For details, see About Literature Database.

Abstract

Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks.

External Datasets

MNIST

FashionMNIST

CIFAR-10

SVHN

References

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating natural language adversarial examples

Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani B. Srivastava, Kai-Wei Chang

Published: 2018

International conference on machine learning

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, David Wagner

Published: 2018

Proceedings of the 35th International Conference on Machine Learning

Synthesizing robust adversarial examples

Anish Athalye, Logan Engstrom, Andrew Ilyas, Kevin Kwok

Published: 2018

Benchmarking Bayesian Deep Learning on Diabetic Retinopathy Detection Tasks

Neil Band, Tim G. J. Rudner, Qixuan Feng, Angelos Filos, Zachary Nado, Michael W. Dusenberry, Ghassen Jerfel, Dustin Tran, Yarin Gal

Published: 2021

Bayesian adversarial spheres: Bayesian inference and adversarial examples in a noiseless setting

Artur Bekasov, Iain Murray

Published: 2018

The effect of prior lipschitz continuity on the adversarial robustness of Bayesian neural networks

Arno Blaas, Stephen J Roberts

Published: 2021

University of Oxford

On the adversarial robustness of Bayesian machine learning models

Arno C Blaas

Published: 2021

Proceedings of Machine Learning Research

Weight uncertainty in neural networks

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra

Published: 2015

On the robustness of Bayesian neural networks to adversarial attacks

Luca Bortolussi, Ginevra Carbone, Luca Laurenti, Andrea Patane, Guido Sanguinetti, Matthew Wicker

Published: 2022

Conference on Neural Information Processing Systems (NeurIPS)

Robustness of Bayesian Neural Networks to Gradient-Based Attacks

Ginevra Carbone, Matthew Wicker, Luca Laurenti, Andrea Patane, Luca Bortolussi, Guido Sanguinetti

Published: 2.11.2020

Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in the limit BNN posteriors are robust to gradient-based adversarial attacks. Experimental results on the MNIST and Fashion MNIST datasets with BNNs trained with Hamiltonian Monte Carlo and Variational Inference support this line of argument, showing that BNNs can display both high accuracy and robustness to gradient based adversarial attacks.

Adversarial attack Robustness Evaluation Robustness Improvement Method

Proceedings of the AAAI Conference on Artificial Intelligence

Robustness guarantees for Bayesian inference with gaussian processes

Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Andrea Patane

Published: 2019