These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Despite the broad application of Machine Learning models as a Service
(MLaaS), they are vulnerable to model stealing attacks. These attacks can
replicate the model functionality by using the black-box query process without
any prior knowledge of the target victim model. Existing stealing defenses add
deceptive perturbations to the victim's posterior probabilities to mislead the
attackers. However, these defenses are now suffering problems of high inference
computational overheads and unfavorable trade-offs between benign accuracy and
stealing robustness, which challenges the feasibility of deployed models in
practice. To address the problems, this paper proposes Isolation and Induction
(InI), a novel and effective training framework for model stealing defenses.
Instead of deploying auxiliary defense modules that introduce redundant
inference time, InI directly trains a defensive model by isolating the
adversary's training gradient from the expected gradient, which can effectively
reduce the inference computational cost. In contrast to adding perturbations
over model predictions that harm the benign accuracy, we train models to
produce uninformative outputs against stealing queries, which can induce the
adversary to extract little useful knowledge from victim models with minimal
impact on the benign performance. Extensive experiments on several visual
classification datasets (e.g., MNIST and CIFAR10) demonstrate the superior
robustness (up to 48% reduction on stealing accuracy) and speed (up to 25.4x
faster) of our InI over other state-of-the-art methods. Our codes can be found
in https://github.com/DIG-Beihang/InI-Model-Stealing-Defense.