These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Model extraction attacks aim to replicate the functionality of a black-box
model through query access, threatening the intellectual property (IP) of
machine-learning-as-a-service (MLaaS) providers. Defending against such attacks
is challenging, as it must balance efficiency, robustness, and utility
preservation in the real-world scenario. Despite the recent advances, most
existing defenses presume that attacker queries have out-of-distribution (OOD)
samples, enabling them to detect and disrupt suspicious inputs. However, this
assumption is increasingly unreliable, as modern models are trained on diverse
datasets and attackers often operate under limited query budgets. As a result,
the effectiveness of these defenses is significantly compromised in realistic
deployment scenarios. To address this gap, we propose MISLEADER (enseMbles of
dIStiLled modEls Against moDel ExtRaction), a novel defense strategy that does
not rely on OOD assumptions. MISLEADER formulates model protection as a bilevel
optimization problem that simultaneously preserves predictive fidelity on
benign inputs and reduces extractability by potential clone models. Our
framework combines data augmentation to simulate attacker queries with an
ensemble of heterogeneous distilled models to improve robustness and diversity.
We further provide a tractable approximation algorithm and derive theoretical
error bounds to characterize defense effectiveness. Extensive experiments
across various settings validate the utility-preserving and
extraction-resistant properties of our proposed defense strategy. Our code is
available at https://github.com/LabRAI/MISLEADER.