AIセキュリティポータル K Program
BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack
Share
Abstract
We study the unique, less-well understood problem of generating sparse adversarial samples simply by observing the score-based replies to model queries. Sparse attacks aim to discover a minimum number-the l0 bounded-perturbations to model inputs to craft adversarial examples and misguide model decisions. But, in contrast to query-based dense attack counterparts against black-box models, constructing sparse adversarial perturbations, even when models serve confidence score information to queries in a score-based setting, is non-trivial. Because, such an attack leads to i) an NP-hard problem; and ii) a non-differentiable search space. We develop the BruSLeAttack-a new, faster (more query-efficient) Bayesian algorithm for the problem. We conduct extensive attack evaluations including an attack demonstration against a Machine Learning as a Service (MLaaS) offering exemplified by Google Cloud Vision and robustness testing of adversarial training regimes and a recent defense against black-box attacks. The proposed attack scales to achieve state-of-the-art attack success rates and query efficiency on standard computer vision tasks such as ImageNet across different model architectures. Our artefacts and DIY attack samples are available on GitHub. Importantly, our work facilitates faster evaluation of model vulnerabilities and raises our vigilance on the safety, security and reliability of deployed systems.
Infinite variational autoencoder for semi-supervised learning
Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel
Published: 2017
Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples
Anish Athalye, Nicholas Carlini, David Wagner
Published: 2018
Towards Evaluating the Robustness of Neural Networks
Nicholas Carlini, David Wagner
Published: 2016.8.17
ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh
Published: 2017.8.14
An analysis of single-layer networks in unsupervised feature learning
A. Coates, A. Ng, H. Lee
Published: 2011
Mind the box: l1-apgd for sparse adversarial attacks on image classifiers
Francesco Croce, Matthias Hein
Published: 2021
Share