These labels were automatically added by AI and may be inaccurate. For details, see About Literature Database.
Abstract
Knowledge distillation has become a cornerstone in modern machine learning
systems, celebrated for its ability to transfer knowledge from a large, complex
teacher model to a more efficient student model. Traditionally, this process is
regarded as secure, assuming the teacher model is clean. This belief stems from
conventional backdoor attacks relying on poisoned training data with backdoor
triggers and attacker-chosen labels, which are not involved in the distillation
process. Instead, knowledge distillation uses the outputs of a clean teacher
model to guide the student model, inherently preventing recognition or response
to backdoor triggers as intended by an attacker. In this paper, we challenge
this assumption by introducing a novel attack methodology that strategically
poisons the distillation dataset with adversarial examples embedded with
backdoor triggers. This technique allows for the stealthy compromise of the
student model while maintaining the integrity of the teacher model. Our
innovative approach represents the first successful exploitation of
vulnerabilities within the knowledge distillation process using clean teacher
models. Through extensive experiments conducted across various datasets and
attack settings, we demonstrate the robustness, stealthiness, and effectiveness
of our method. Our findings reveal previously unrecognized vulnerabilities and
pave the way for future research aimed at securing knowledge distillation
processes against backdoor attacks.