IEEE transactions on neural networks and learning systems
Rethinking Decoupled Knowledge Distillation: A Predictive Distribution Perspective.
Bowen Zheng, Ran Cheng
Published: 202510.1109/TNNLS.2025.3639562
Abstract
In the history of knowledge distillation (KD), the focus has once shifted over time from logit-based to feature-based approaches. However, this transition has been revisited with the advent of decoupled KD (DKD), which reemphasizes the importance of…
Preview only. Read the full abstract at the source