Machine learning models have been found to be susceptible to adversarial
examples that are often indistinguishable from the original inputs. These
adversarial examples are created by applying adversarial perturbations to input
samples, which would cause them to be misclassified by the target models.
Attacks that search and apply the perturbations to create adversarial examples
are performed in both white-box and black-box settings, depending on the
information available to the attacker about the target. For black-box attacks,
the only capability available to the attacker is the ability to query the
target with specially crafted inputs and observing the labels returned by the
model. Current black-box attacks either have low success rates, requires a high
number of queries, or produce adversarial examples that are easily
distinguishable from their sources. In this paper, we present AdversarialPSO, a
black-box attack that uses fewer queries to create adversarial examples with
high success rates. AdversarialPSO is based on the evolutionary search
algorithm Particle Swarm Optimization, a populationbased gradient-free
optimization algorithm. It is flexible in balancing the number of queries
submitted to the target vs the quality of imperceptible adversarial examples.
The attack has been evaluated using the image classification benchmark datasets
CIFAR-10, MNIST, and Imagenet, achieving success rates of 99.6%, 96.3%, and
82.0%, respectively, while submitting substantially fewer queries than the
state-of-the-art. We also present a black-box method for isolating salient
features used by models when making classifications. This method, called Swarms
with Individual Search Spaces or SWISS, creates adversarial examples by finding
and modifying the most important features in the input.