Zheng Zhaohui, Ye Rongguang, Hou Qibin, Ren Dongwei, Wang Ping, Zuo Wangmeng, Cheng Ming-Ming
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10070-10083. doi: 10.1109/TPAMI.2023.3248583. Epub 2023 Jun 30.
Previous knowledge distillation (KD) methods for object detection mostly focus on feature imitation instead of mimicking the prediction logits due to its inefficiency in distilling the localization information. In this paper, we investigate whether logit mimicking always lags behind feature imitation. Towards this goal, we first present a novel localization distillation (LD) method which can efficiently transfer the localization knowledge from the teacher to the student. Second, we introduce the concept of valuable localization region that can aid to selectively distill the classification and localization knowledge for a certain region. Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking under-performs for years. The thorough studies exhibit the great potential of logit mimicking that can significantly alleviate the localization ambiguity, learn robust feature representation, and ease the training difficulty in the early stage. We also provide the theoretical connection between the proposed LD and the classification KD, that they share the equivalent optimization effect. Our distillation scheme is simple as well as effective and can be easily applied to both dense horizontal object detectors and rotated object detectors. Extensive experiments on the MS COCO, PASCAL VOC, and DOTA benchmarks demonstrate that our method can achieve considerable AP improvement without any sacrifice on the inference speed. Our source code and pretrained models are publicly available at https://github.com/HikariTJU/LD.
先前用于目标检测的知识蒸馏(KD)方法大多专注于特征模仿,而非模仿预测逻辑,因为在蒸馏定位信息方面效率低下。在本文中,我们研究了逻辑模仿是否总是落后于特征模仿。为实现这一目标,我们首先提出了一种新颖的定位蒸馏(LD)方法,该方法能够有效地将定位知识从教师模型传递给学生模型。其次,我们引入了有价值定位区域的概念,这有助于为特定区域选择性地蒸馏分类和定位知识。结合这两个新组件,我们首次表明逻辑模仿可以超越特征模仿,并且缺乏定位蒸馏是逻辑模仿多年来表现不佳的关键原因。深入研究表明,逻辑模仿具有巨大潜力,能够显著减轻定位模糊性、学习鲁棒的特征表示并缓解早期训练难度。我们还提供了所提出的LD与分类KD之间的理论联系,即它们具有等效的优化效果。我们的蒸馏方案既简单又有效,并且可以轻松应用于密集水平目标检测器和旋转目标检测器。在MS COCO、PASCAL VOC和DOTA基准上进行的大量实验表明,我们的方法可以在不牺牲推理速度的情况下实现显著的AP提升。我们的源代码和预训练模型可在https://github.com/HikariTJU/LD上公开获取。