Powerful adversarial attack methods are vital for understanding how to
construct robust deep neural networks (DNNs) and for thoroughly testing defense
techniques. In this paper, we propose a black-box adversarial attack algorithm
that can defeat both vanilla DNNs and those generated by various defense
techniques developed recently. Instead of searching for an "optimal"
adversarial example for a benign input to a targeted DNN, our algorithm finds a
probability density distribution over a small region centered around the input,
such that a sample drawn from this distribution is likely an adversarial
example, without the need of accessing the DNN's internal layers or weights.
Our approach is universal as it can successfully attack different neural
networks by a single algorithm. It is also strong; according to the testing
against 2 vanilla DNNs and 13 defended ones, it outperforms state-of-the-art
black-box or white-box attack methods for most test cases. Additionally, our
results reveal that adversarial training remains one of the best defense
techniques, and the adversarial examples are not as transferable across
defended DNNs as them across vanilla DNNs.