知識蒸留

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Authors: Shuai Zhao, Xiaobao Wu, Cong-Duy Nguyen, Yanhao Jia, Meihuizi Jia, Yichao Feng, Luu Anh Tuan | Published: 2024-10-18 | Updated: 2025-05-20
バックドアモデルの検知
バックドア攻撃手法
知識蒸留

Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

Authors: Byeongho Heo, Minsik Lee, Sangdoo Yun, Jin Young Choi | Published: 2018-05-15 | Updated: 2018-12-14
敵対的サンプル
敵対的攻撃検出
知識蒸留