High Accuracy and High Fidelity Extraction of Neural Networks

TOP 文献データベース High Accuracy and High Fidelity Extraction of Neural Networks

USENIX Security Symposium

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1909.01838

PDF

https://arxiv.org/pdf/1909.01838

文献情報

作者: Matthew Jagielski,Nicholas Carlini,David Berthelot,Alex Kurakin,Nicolas Papernot
公開日: 2025-3-25
所属機関: Northeastern University
所属の国: United States of America
会議名: USENIX Security Symposium

AIにより推定されたラベル

敵対的サンプルモデル評価モデル抽出攻撃

Abstract

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack exploiting the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model---i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical functionally-equivalent extraction attack for direct extraction (i.e., without training) of a model's weights. We perform experiments both on academic datasets and a state-of-the-art image classifier trained with 1 billion proprietary images. In addition to broadening the scope of model extraction research, our work demonstrates the practicality of model extraction attacks against production-grade systems.