These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Machine Learning-as-a-Service, a pay-as-you-go business pattern, is widely
accepted by third-party users and developers. However, the open inference APIs
may be utilized by malicious customers to conduct model extraction attacks,
i.e., attackers can replicate a cloud-based black-box model merely via querying
malicious examples. Existing model extraction attacks mainly depend on the
posterior knowledge (i.e., predictions of query samples) from Oracle. Thus,
they either require high query overhead to simulate the decision boundary, or
suffer from generalization errors and overfitting problems due to query budget
limitations. To mitigate it, this work proposes an efficient model extraction
attack based on prior knowledge for the first time. The insight is that prior
knowledge of unlabeled proxy datasets is conducive to the search for the
decision boundary (e.g., informative samples). Specifically, we leverage
self-supervised learning including autoencoder and contrastive learning to
pre-compile the prior knowledge of the proxy dataset into the feature extractor
of the substitute model. Then we adopt entropy to measure and sample the most
informative examples to query the target model. Our design leverages both prior
and posterior knowledge to extract the model and thus eliminates
generalizability errors and overfitting problems. We conduct extensive
experiments on open APIs like Traffic Recognition, Flower Recognition,
Moderation Recognition, and NSFW Recognition from real-world platforms, Azure
and Clarifai. The experimental results demonstrate the effectiveness and
efficiency of our attack. For example, our attack achieves 95.1% fidelity with
merely 1.8K queries (cost 2.16$) on the NSFW Recognition API. Also, the
adversarial examples generated with our substitute model have better
transferability than others, which reveals that our scheme is more conducive to
downstream attacks.