BruSLeAttack: A Query-Efficient Score-Based Black-Box Sparse Adversarial Attack

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Infinite variational autoencoder for semi-supervised learning

Ehsan Abbasnejad, Anthony Dick, Anton van den Hengel

Published: 2017

arxiv

被引用数 10

European Conference on Computer Vision (ECCV)

Square Attack: a query-efficient black-box adversarial attack via random search

Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein

Published: 2019.11.30

We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized search scheme which selects localized square-shaped updates at random positions so that at each iteration the perturbation is situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art methods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks by a factor of at least $1.8$ and up to $3$ compared to the recent state-of-the-art $l_\infty$-attack of Al-Dujaili & O'Reilly. Moreover, although our attack is black-box, it can also outperform gradient-based white-box attacks on the standard benchmarks achieving a new state-of-the-art in terms of the success rate. The code of our attack is available at https://github.com/max-andr/square-attack.

敵対的攻撃検出敵対的訓練モデル評価

International conference on machine learning

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, David Wagner

Published: 2018

arxiv

被引用数 1

IEEE/CVF International Conference on Computer Vision (ICCV)

Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks

Thomas Brunner, Frederik Diehl, Michael Truong Le, Alois Knoll

Published: 2018.12.24

We consider adversarial examples for image classification in the black-box decision-based setting. Here, an attacker cannot access confidence scores, but only the final label. Most attacks for this scenario are either unreliable or inefficient. Focusing on the latter, we show that a specific class of attacks, Boundary Attacks, can be reinterpreted as a biased sampling framework that gains efficiency from domain knowledge. We identify three such biases, image frequency, regional masks and surrogate gradients, and evaluate their performance against an ImageNet classifier. We show that the combination of these biases outperforms the state of the art by a wide margin. We also showcase an efficient way to attack the Google Cloud Vision API, where we craft convincing perturbations with just a few hundred queries. Finally, the methods we propose have also been found to work very well against strong defenses: Our targeted attack won second place in the NeurIPS 2018 Adversarial Vision Challenge.

敵対的サンプルの検知モデルの頑健性保証ロバスト性

arxiv

被引用数 1

IEEE Symposium on Security and Privacy

Towards Evaluating the Robustness of Neural Networks

Nicholas Carlini, David Wagner

Published: 2016.8.17

Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input $x$ and any target classification $t$, it is possible to find a new input $x'$ that is similar to $x$ but classified as $t$. This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0.5\%$. In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with $100\%$ probability. Our attacks are tailored to three distance metrics used previously in the literature, and when compared to previous adversarial example generation algorithms, our attacks are often much more effective (and never worse). Furthermore, we propose using high-confidence adversarial examples in a simple transferability test we show can also be used to break defensive distillation. We hope our attacks will be used as a benchmark in future defense attempts to create neural networks that resist adversarial examples.

モデルの堅牢性敵対的サンプルモデルの頑健性保証

arxiv

被引用数 1

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

Jianbo Chen, Michael I. Jordan, Martin J. Wainwright

Published: 2019.4.4

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for $\ell_2$ and $\ell_\infty$ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than Boundary Attack. It also achieves competitive performance in attacking several widely-used defense mechanisms. (HopSkipJumpAttack was named Boundary Attack++ in a previous version of the preprint.)

ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, Cho-Jui Hsieh

Published: 2017.8.14

Deep neural networks (DNNs) are one of the most prominent technologies of our time, as they achieve state-of-the-art performance in many machine learning tasks, including but not limited to image classification, text mining, and speech processing. However, recent research on DNNs has indicated ever-increasing concern on the robustness to adversarial examples, especially for security-critical tasks such as traffic sign identification for autonomous driving. Studies have unveiled the vulnerability of a well-trained DNN by demonstrating the ability of generating barely noticeable (to both human and machines) adversarial images that lead to misclassification. Furthermore, researchers have shown that these adversarial images are highly transferable by simply training and attacking a substitute model built upon the target model, known as a black-box attack to DNNs. Similar to the setting of training substitute models, in this paper we propose an effective black-box attack that also only has access to the input (images) and the output (confidence scores) of a targeted DNN. However, different from leveraging attack transferability from substitute models, we propose zeroth order optimization (ZOO) based attacks to directly estimate the gradients of the targeted DNN for generating adversarial examples. We use zeroth order stochastic coordinate descent along with dimension reduction, hierarchical attack and importance sampling techniques to efficiently attack black-box models. By exploiting zeroth order optimization, improved attacks to the targeted DNN can be accomplished, sparing the need for training substitute models and avoiding the loss in attack transferability. Experimental results on MNIST, CIFAR10 and ImageNet show that the proposed ZOO attack is as effective as the state-of-the-art white-box attack and significantly outperforms existing black-box attacks via substitute models.

ポイズニング攻撃手法モデルの頑健性保証

Proceedings of the 14th International Conference on Artificial Intelligence and Statistics

An analysis of single-layer networks in unsupervised feature learning

A. Coates, A. Ng, H. Lee

Published: 2011

arxiv

被引用数 1

IEEE/CVF International Conference on Computer Vision (ICCV)

Sparse and Imperceivable Adversarial Attacks

Francesco Croce, Matthias Hein

Published: 2019.9.11

Neural networks have been proven to be vulnerable to a variety of adversarial attacks. From a safety perspective, highly sparse adversarial attacks are particularly dangerous. On the other hand the pixelwise perturbations of sparse attacks are typically large and thus can be potentially detected. We propose a new black-box technique to craft adversarial examples aiming at minimizing $l_0$-distance to the original image. Extensive experiments show that our attack is better or competitive to the state of the art. Moreover, we can integrate additional bounds on the componentwise perturbation. Allowing pixels to change only in region of high variation and avoiding changes along axis-aligned edges makes our adversarial examples almost non-perceivable. Moreover, we adapt the Projected Gradient Descent attack to the $l_0$-norm integrating componentwise constraints. This allows us to do adversarial training to enhance the robustness of classifiers against sparse and imperceivable adversarial manipulations.

ポイズニング攻撃手法機械学習手法

International Conference on Machine Learning

Mind the box: l1-apgd for sparse adversarial attacks on image classifiers

Francesco Croce, Matthias Hein

Published: 2021