Undistillable: Making A Nasty Teacher That CANNOT teach students

TOP 文献データベース Undistillable: Making A Nasty Teacher That CANNOT teach students

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2105.07381

PDF

https://arxiv.org/pdf/2105.07381

文献情報

作者: Haoyu Ma;Tianlong Chen;Ting-Kuei Hu;Chenyu You;Xiaohui Xie;Zhangyang Wang
公開日: 2021-5-16
所属機関: University of California, Irvine
所属の国: United States of America
会議名: International Conference on Learning Representations (ICLR)

AIにより推定されたラベル

自己教師あり学習敵対的学習深層学習技術

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models. However, in certain situations, this technique is more of a curse than a blessing. For instance, KD poses a potential risk of exposing intellectual properties (IPs): even if a trained machine learning model is released in 'black boxes' (e.g., as executable software or APIs without open-sourcing code), it can still be replicated by KD through imitating input-output behaviors. To prevent this unwanted effect of KD, this paper introduces and investigates a concept called Nasty Teacher: a specially trained teacher network that yields nearly the same performance as a normal one, but would significantly degrade the performance of student models learned by imitating it. We propose a simple yet effective algorithm to build the nasty teacher, called self-undermining knowledge distillation. Specifically, we aim to maximize the difference between the output of the nasty teacher and a normal pre-trained network. Extensive experiments on several datasets demonstrate that our method is effective on both standard KD and data-free KD, providing the desirable KD-immunity to model owners for the first time. We hope our preliminary study can draw more awareness and interest in this new practical problem of both social and legal importance.