This website requires JavaScript.

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

Deng LiAming WuYahong HanQi Tian
Dec 2022
Recently, large-scale pre-trained models have shown their advantages in manytasks. However, due to the huge computational complexity and storagerequirements, it is challenging to apply the large-scale model to real scenes.A common solution is knowledge distillation which regards the large-scale modelas a teacher model and helps to train a small student model to obtain acompetitive performance. Cross-task Knowledge distillation expands theapplication scenarios of the large-scale pre-trained model. Existing knowledgedistillation works focus on directly mimicking the final prediction or theintermediate layers of the teacher model, which represent the global-levelcharacteristics and are task-specific. To alleviate the constraint of differentlabel spaces, capturing invariant intrinsic local object characteristics (suchas the shape characteristics of the leg and tail of the cattle and horse) playsa key role. Considering the complexity and variability of real scene tasks, wepropose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approachto transfer the intrinsic local-level object knowledge of a large-scale teachernetwork to various task scenarios. First, to better transfer the generalizedknowledge in the teacher model in cross-task scenarios, we propose a prototypelearning module to learn from the essential feature representation of objectsin the teacher model. Secondly, for diverse downstream tasks, we propose atask-adaptive feature augmentation module to enhance the features of thestudent model with the learned generalization prototype features and guide thetraining of the student model to improve its generalization ability. Theexperimental results on various visual tasks demonstrate the effectiveness ofour approach for large-scale model cross-task knowledge distillation scenes.