Knowledge Distillation (KD) is a widely used technique to transfer knowledge
from pre-trained teacher models to (usually more lightweight) student models.
However, in certain situations, this technique is more of a curse than a
blessing. For instance, KD poses a potential risk of exposing intellectual
properties (IPs): even if a trained machine learning model is released in
'black boxes' (e.g., as executable software or APIs without open-sourcing
code), it can still be replicated by KD through imitating input-output
behaviors. To prevent this unwanted effect of KD, this paper introduces and
investigates a concept called Nasty Teacher: a specially trained teacher
network that yields nearly the same performance as a normal one, but would
significantly degrade the performance of student models learned by imitating
it. We propose a simple yet effective algorithm to build the nasty teacher,
called self-undermining knowledge distillation. Specifically, we aim to
maximize the difference between the output of the nasty teacher and a normal
pre-trained network. Extensive experiments on several datasets demonstrate that
our method is effective on both standard KD and data-free KD, providing the
desirable KD-immunity to model owners for the first time. We hope our
preliminary study can draw more awareness and interest in this new practical
problem of both social and legal importance.