AIセキュリティポータル K Program
Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness
Share
Abstract
Adversarial examples pose a security risk as they can alter decisions of a machine learning classifier through slight input perturbations. Certified robustness has been proposed as a mitigation where given an input $\mathbf{x}$, a classifier returns a prediction and a certified radius $R$ with a provable guarantee that any perturbation to $\mathbf{x}$ with $R$-bounded norm will not alter the classifier's prediction. In this work, we show that these guarantees can be invalidated due to limitations of floating-point representation that cause rounding errors. We design a rounding search method that can efficiently exploit this vulnerability to find adversarial examples against state-of-the-art certifications in two threat models, that differ in how the norm of the perturbation is computed. We show that the attack can be carried out against linear classifiers that have exact certifiable guarantees and against neural networks that have conservative certifications. In the weak threat model, our experiments demonstrate attack success rates over 50% on random linear classifiers, up to 23% on the MNIST dataset for linear SVM, and up to 15% for a neural network. In the strong threat model, the success rates are lower but positive. The floating-point errors exploited by our attacks can range from small to large (e.g., $10^{-13}$ to $10^{3}$) - showing that even negligible errors can be systematically exploited to invalidate guarantees provided by certified robustness. Finally, we propose a formal mitigation approach based on rounded interval arithmetic, encouraging future implementations of robustness certificates to account for limitations of modern computing architecture to provide sound certifiable guarantees.
Exact solutions to linear programming problems
David L Applegate, William Cook, Sanjeeb Dash, Daniel G Espinoza
Published: 2007
Interval estimation for a binomial proportion
Lawrence D. Brown, T. Tony Cai, Anirban DasGupta
Published: 2001
Branch and bound for piecewise linear neural network verification
Rudy Bunel, Jingyue Lu, Ilker Turkaslan, P Kohli, P Torr, M Pawan Kumar
Published: 2020
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini, David Wagner
Published: 2016.8.17
Certified adversarial robustness via randomized smoothing
J. Cohen, E. Rosenfeld, Z. Kolter
Published: 2019
Affine arithmetic: concepts and applications
Luiz Henrique De Figueiredo, Jorge Stolfi
Published: 2004
Formal verification of piece-wise linear feed-forward neural networks
Ruediger Ehlers
Published: 2017
How many bits does it take to quantize your neural network?
Mirco Giacobbe, Thomas A Henzinger, Mathias Lechner
Published: 2020
Scalable verification of quantized neural networks
Thomas A Henzinger, Mathias Lechner, Ðorđe Žikelić
Published: 2021
Accuracy and stability of numerical algorithms
Nicholas J. Higham
Published: 2002
Approximation capabilities of multilayer feedforward networks
Kurt Hornik
Published: 1991
IEEE standard for floating-point arithmetic
Published: 2019
Interval analysis
Luc Jaulin, Michel Kieffer, Olivier Didrit, Eric Walter
Published: 2001
Efficient exact verification of binarized neural networks
Kai Jia, Martin C. Rinard
Published: 2020
Exploiting verified neural networks via floating point numerical error
Kai Jia, Martin Rinard
Published: 2021
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu
Published: 2017.6.20
Fast and effective robustness certification
Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, Martin Vechev
Published: 2018
An abstract domain for certifying neural networks
G. Singh, T. Gehr, M. Puschel, M. Vechev
Published: 2019
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus
Published: 2014
Evaluating Robustness of Neural Networks with Mixed Integer Programming
Vincent Tjeng, Kai Xiao, Russ Tedrake
Published: 2017.11.21
Beta-CROWN: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification
Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, J Zico Kolter
Published: 2021
Efficient neural network robustness certification with general activation functions
Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, Luca Daniel
Published: 2018
Fooling a complete neural network verifier
Dániel Zombori, Balázs Bánhelyi, Tibor Csendes, István Megyeri, Márk Jelasity
Published: 2020
Share