Information leakage issues in machine learning-based Web applications have
attracted increasing attention. While the risk of data privacy leakage has been
rigorously analyzed, the theory of model function leakage, known as Model
Extraction Attacks (MEAs), has not been well studied. In this paper, we are the
first to understand MEAs theoretically from an attack-agnostic perspective and
to propose analytical metrics for evaluating model extraction risks. By using
the Neural Tangent Kernel (NTK) theory, we formulate the linearized MEA as a
regularized kernel classification problem and then derive the fidelity gap and
generalization error bounds of the attack performance. Based on these
theoretical analyses, we propose a new theoretical metric called Model Recovery
Complexity (MRC), which measures the distance of weight changes between the
victim and surrogate models to quantify risk. Additionally, we find that victim
model accuracy, which shows a strong positive correlation with model extraction
risk, can serve as an empirical metric. By integrating these two metrics, we
propose a framework, namely Model Extraction Risk Inspector (MER-Inspector), to
compare the extraction risks of models under different model architectures by
utilizing relative metric values. We conduct extensive experiments on 16 model
architectures and 5 datasets. The experimental results demonstrate that the
proposed metrics have a high correlation with model extraction risks, and
MER-Inspector can accurately compare the extraction risks of any two models
with up to 89.58%.