This website requires JavaScript.

Curriculum Temperature for Knowledge Distillation

Zheng LiXiang LiLingfeng Yang ...+4 Jian Yang
Nov 2022
摘要
Most existing distillation methods ignore the flexible role of thetemperature in the loss function and fix it as a hyper-parameter that can bedecided by an inefficient grid search. In general, the temperature controls thediscrepancy between two distributions and can faithfully determine thedifficulty level of the distillation task. Keeping a constant temperature,i.e., a fixed level of task difficulty, is usually sub-optimal for a growingstudent during its progressive learning stages. In this paper, we propose asimple curriculum-based technique, termed Curriculum Temperature for KnowledgeDistillation (CTKD), which controls the task difficulty level during thestudent's learning career through a dynamic and learnable temperature.Specifically, following an easy-to-hard curriculum, we gradually increase thedistillation loss w.r.t. the temperature, leading to increased distillationdifficulty in an adversarial manner. As an easy-to-use plug-in technique, CTKDcan be seamlessly integrated into existing knowledge distillation frameworksand brings general improvements at a negligible additional computation cost.Extensive experiments on CIFAR-100, ImageNet-2012, and MS-COCO demonstrate theeffectiveness of our method. Our code is available athttps://github.com/zhengli97/CTKD.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答