Model Reconstruction from Model Explanations

TOP 文献データベース Model Reconstruction from Model Explanations

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/1807.05185

PDF

https://arxiv.org/pdf/1807.05185

文献情報

作者: Smitha Milli,Ludwig Schmidt,Anca D. Dragan,Moritz Hardt
公開日: 2018-7-14
所属機関: University of California, Berkeley
所属の国: United States of America
会議名: FAT

AIにより推定されたラベル

モデル評価モデル抽出攻撃クエリの多様性

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself. Our results speak to a tension between the desire to keep a proprietary model secret and the ability to offer model explanations. On the theoretical side, we give an algorithm that provably learns a two-layer ReLU network in a setting where the algorithm may query the gradient of the model with respect to chosen inputs. The number of queries is independent of the dimension and nearly optimal in its dependence on the model size. Of interest not only from a learning-theoretic perspective, this result highlights the power of gradients rather than labels as a learning primitive. Complementing our theory, we give effective heuristics for reconstructing models from gradient explanations that are orders of magnitude more query-efficient than reconstruction attacks relying on prediction interfaces.