These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
We study design of black-box model extraction attacks that can send minimal
number of queries from a publicly available dataset to a target ML model
through a predictive API with an aim to create an informative and
distributionally equivalent replica of the target. First, we define
distributionally equivalent and Max-Information model extraction attacks, and
reduce them into a variational optimisation problem. The attacker sequentially
solves this optimisation problem to select the most informative queries that
simultaneously maximise the entropy and reduce the mismatch between the target
and the stolen models. This leads to an active sampling-based query selection
algorithm, Marich, which is model-oblivious. Then, we evaluate Marich on
different text and image data sets, and different models, including CNNs and
BERT. Marich extracts models that achieve $\sim 60-95\%$ of true model's
accuracy and uses $\sim 1,000 - 8,500$ queries from the publicly available
datasets, which are different from the private training datasets. Models
extracted by Marich yield prediction distributions, which are $\sim 2-4\times$
closer to the target's distribution in comparison to the existing active
sampling-based attacks. The extracted models also lead to $84-96\%$ accuracy
under membership inference attacks. Experimental results validate that Marich
is query-efficient, and capable of performing task-accurate, high-fidelity, and
informative model extraction.