These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Many machine learning models are vulnerable to adversarial examples: inputs
that are specially crafted to cause a machine learning model to produce an
incorrect output. Adversarial examples that affect one model often affect
another model, even if the two models have different architectures or were
trained on different training sets, so long as both models were trained to
perform the same task. An attacker may therefore train their own substitute
model, craft adversarial examples against the substitute, and transfer them to
a victim model, with very little information about the victim. Recent work has
further developed a technique that uses the victim model as an oracle to label
a synthetic training set for the substitute, so the attacker need not even
collect a training set to mount the attack. We extend these recent techniques
using reservoir sampling to greatly enhance the efficiency of the training
procedure for the substitute model. We introduce new transferability attacks
between previously unexplored (substitute, victim) pairs of machine learning
model classes, most notably SVMs and decision trees. We demonstrate our attacks
on two commercial machine learning classification systems from Amazon (96.19%
misclassification rate) and Google (88.94%) using only 800 queries of the
victim model, thereby showing that existing machine learning approaches are in
general vulnerable to systematic black-box attacks regardless of their
structure.