CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble

Authors: Jonathan Rosenthal, Shanchao Liang, Kevin Zhang, Lin Tan | Published: 2024-09-16

2024.09.162025.05.27

Authors: Jonathan Rosenthal, Shanchao Liang, Kevin Zhang, Lin Tan
Published: 2024-09-16

Source: https://arxiv.org/abs/2409.10643

PDF: https://arxiv.org/pdf/2409.10643

Labels Predicted by AI

Model Extraction Attack Training Data Extraction Method Dataset Generation

Please note that these labels were automatically added by AI. Therefore, they may not be entirely accurate.
For more details, please see the About the Literature Database page.

Abstract

Machine Learning as a Service (MLaaS) is often provided as a pay-per-query, black-box system to clients. Such a black-box approach not only hinders open replication, validation, and interpretation of model results, but also makes it harder for white-hat researchers to identify vulnerabilities in the MLaaS systems. Model extraction is a promising technique to address these challenges by reverse-engineering black-box models. Since training data is typically unavailable for MLaaS models, this paper focuses on the realistic version of it: data-free model extraction. We propose a data-free model extraction approach, CaBaGe, to achieve higher model extraction accuracy with a small number of queries. Our innovations include (1) a novel experience replay for focusing on difficult training samples; (2) an ensemble of generators for steadily producing diverse synthetic data; and (3) a selective filtering process for querying the victim model with harder, more balanced samples. In addition, we create a more realistic setting, for the first time, where the attacker has no knowledge of the number of classes in the victim training data, and create a solution to learn the number of classes on the fly. Our evaluation shows that CaBaGe outperforms existing techniques on seven datasets – MNIST, FMNIST, SVHN, CIFAR-10, CIFAR-100, ImageNet-subset, and Tiny ImageNet – with an accuracy improvement of the extracted models by up to 43.13 the number of queries required to extract a clone model matching the final accuracy of prior work is reduced by up to 75.7