These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
With increasingly more data and computation involved in their training,
machine learning models constitute valuable intellectual property. This has
spurred interest in model stealing, which is made more practical by advances in
learning with partial, little, or no supervision. Existing defenses focus on
inserting unique watermarks in a model's decision surface, but this is
insufficient: the watermarks are not sampled from the training distribution and
thus are not always preserved during model stealing. In this paper, we make the
key observation that knowledge contained in the stolen model's training set is
what is common to all stolen copies. The adversary's goal, irrespective of the
attack employed, is always to extract this knowledge or its by-products. This
gives the original model's owner a strong advantage over the adversary: model
owners have access to the original training data. We thus introduce $dataset$
$inference$, the process of identifying whether a suspected model copy has
private knowledge from the original model's dataset, as a defense against model
stealing. We develop an approach for dataset inference that combines
statistical testing with the ability to estimate the distance of multiple data
points to the decision boundary. Our experiments on CIFAR10, SVHN, CIFAR100 and
ImageNet show that model owners can claim with confidence greater than 99% that
their model (or dataset as a matter of fact) was stolen, despite only exposing
50 of the stolen model's training points. Dataset inference defends against
state-of-the-art attacks even when the adversary is adaptive. Unlike prior
work, it does not require retraining or overfitting the defended model.