These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Model extraction attacks are designed to steal trained models with only query
access, as is often provided through APIs that ML-as-a-Service providers offer.
Machine Learning (ML) models are expensive to train, in part because data is
hard to obtain, and a primary incentive for model extraction is to acquire a
model while incurring less cost than training from scratch. Literature on model
extraction commonly claims or presumes that the attacker is able to save on
both data acquisition and labeling costs. We thoroughly evaluate this
assumption and find that the attacker often does not. This is because current
attacks implicitly rely on the adversary being able to sample from the victim
model's data distribution. We thoroughly research factors influencing the
success of model extraction. We discover that prior knowledge of the attacker,
i.e., access to in-distribution data, dominates other factors like the attack
policy the adversary follows to choose which queries to make to the victim
model API. Our findings urge the community to redefine the adversarial goals of
ME attacks as current evaluation methods misinterpret the ME performance.