This website requires JavaScript.

BD-KD: Balancing the Divergences for Online Knowledge Distillation

Ibtihel AmaraNazanin SepahvandBrett H. MeyerWarren J. GrossJames J. Clark
Dec 2022
摘要
Knowledge distillation (KD) has gained a lot of attention in the field ofmodel compression for edge devices thanks to its effectiveness in compressinglarge powerful networks into smaller lower-capacity models. Onlinedistillation, in which both the teacher and the student are learningcollaboratively, has also gained much interest due to its ability to improve onthe performance of the networks involved. The Kullback-Leibler (KL) divergenceensures the proper knowledge transfer between the teacher and student. However,most online KD techniques present some bottlenecks under the network capacitygap. By cooperatively and simultaneously training, the models the KL distancebecomes incapable of properly minimizing the teacher's and student'sdistributions. Alongside accuracy, critical edge device applications are inneed of well-calibrated compact networks. Confidence calibration provides asensible way of getting trustworthy predictions. We propose BD-KD: Balancing ofDivergences for online Knowledge Distillation. We show that adaptivelybalancing between the reverse and forward divergences shifts the focus of thetraining strategy to the compact student network without limiting the teachernetwork's learning process. We demonstrate that, by performing this balancingdesign at the level of the student distillation loss, we improve upon bothperformance accuracy and calibration of the compact student network. Weconducted extensive experiments using a variety of network architectures andshow improvements on multiple datasets including CIFAR-10, CIFAR-100,Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approachthrough comprehensive comparisons and ablations with current state-of-the-artonline and offline KD techniques.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答