These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine learning (ML) models, e.g., deep neural networks (DNNs), are
vulnerable to adversarial examples: malicious inputs modified to yield
erroneous model outputs, while appearing unmodified to human observers.
Potential attacks include having malicious content like malware identified as
legitimate or controlling vehicle behavior. Yet, all existing adversarial
example attacks require knowledge of either the model internals or its training
data. We introduce the first practical demonstration of an attacker controlling
a remotely hosted DNN with no such knowledge. Indeed, the only capability of
our black-box adversary is to observe labels given by the DNN to chosen inputs.
Our attack strategy consists in training a local model to substitute for the
target DNN, using inputs synthetically generated by an adversary and labeled by
the target DNN. We use the local substitute to craft adversarial examples, and
find that they are misclassified by the targeted DNN. To perform a real-world
and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online
deep learning API. We find that their DNN misclassifies 84.24% of the
adversarial examples crafted with our substitute. We demonstrate the general
applicability of our strategy to many ML techniques by conducting the same
attack against models hosted by Amazon and Google, using logistic regression
substitutes. They yield adversarial examples misclassified by Amazon and Google
at rates of 96.19% and 88.94%. We also find that this black-box attack strategy
is capable of evading defense strategies previously found to make adversarial
example crafting harder.