Yang Chuanguang, An Zhulin, Zhou Helong, Zhuang Fuzhen, Xu Yongjun, Zhang Qian
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10212-10227. doi: 10.1109/TPAMI.2023.3257878. Epub 2023 Jun 30.
The teacher-free online Knowledge Distillation (KD) aims to train an ensemble of multiple student models collaboratively and distill knowledge from each other. Although existing online KD methods achieve desirable performance, they often focus on class probabilities as the core knowledge type, ignoring the valuable feature representational information. We present a Mutual Contrastive Learning (MCL) framework for online KD. The core idea of MCL is to perform mutual interaction and transfer of contrastive distributions among a cohort of networks in an online manner. Our MCL can aggregate cross-network embedding information and maximize the lower bound to the mutual information between two networks. This enables each network to learn extra contrastive knowledge from others, leading to better feature representations, thus improving the performance of visual recognition tasks. Beyond the final layer, we extend MCL to intermediate layers and perform an adaptive layer-matching mechanism trained by meta-optimization. Experiments on image classification and transfer learning to visual recognition tasks show that layer-wise MCL can lead to consistent performance gains against state-of-the-art online KD approaches. The superiority demonstrates that layer-wise MCL can guide the network to generate better feature representations. Our code is publicly avaliable at https://github.com/winycg/L-MCL.
无教师在线知识蒸馏(KD)旨在协同训练多个学生模型的集合,并相互蒸馏知识。尽管现有的在线KD方法取得了理想的性能,但它们通常将类别概率作为核心知识类型,而忽略了有价值的特征表示信息。我们提出了一种用于在线KD的相互对比学习(MCL)框架。MCL的核心思想是以在线方式在一组网络之间进行对比分布的相互交互和传递。我们的MCL可以聚合跨网络嵌入信息,并最大化两个网络之间互信息的下界。这使得每个网络能够从其他网络学习额外的对比知识,从而产生更好的特征表示,进而提高视觉识别任务的性能。除了最后一层,我们还将MCL扩展到中间层,并执行通过元优化训练的自适应层匹配机制。在图像分类和向视觉识别任务的迁移学习上的实验表明,分层MCL相对于最先进的在线KD方法能够带来一致的性能提升。这种优越性表明分层MCL可以引导网络生成更好的特征表示。我们的代码可在https://github.com/winycg/L-MCL上公开获取。