Wang Hao, Jia Tong, Wang Qilong, Zuo Wangmeng
IEEE Trans Image Process. 2024;33:4796-4810. doi: 10.1109/TIP.2024.3445740. Epub 2024 Aug 30.
Balancing the trade-off between accuracy and speed for obtaining higher performance without sacrificing the inference time is a challenging topic for object detection task. Knowledge distillation, which serves as a kind of model compression techniques, provides a potential and feasible way to handle above efficiency and effectiveness issue through transferring the dark knowledge from the sophisticated teacher detector to the simple student one. Despite demonstrating promising solutions to make harmonies between accuracy and speed, current knowledge distillation for object detection methods still suffer from two limitations. Firstly, most of the methods are inherited or refereed from the frameworks in image classification task, and deploy an implicit manner by imitating or constraining the features from the intermediate layers or the output predictions between the teacher and student models. While little consideration has been raised to the intrinsic relevance of the classification and localization predictions in object detection task. Besides, these methods fail to investigate the relationship between detection and distillation tasks in knowledge distillation pipeline, and they train the whole network by simply integrating losses from these two different tasks through hand-crafted designation parameters. For addressing the aforementioned issues, we propose a novel Relation Knowledge Distillation by Auxiliary Learning for Object Detection (ReAL) method in this paper. Specifically, we first design a prediction relation distillation module which makes the student model directly mimic the output predictions from the teacher one, and conduct self and mutual relation distillation losses to excavate the relation information between teacher and student models. Moreover, for better devolving into the relationship between different tasks in distillation pipeline, we introduce the auxiliary learning into knowledge distillation for object detection and develop a dynamic weight adaptation strategy. Through regarding detection task as primary task and treating distillation task as auxiliary task in auxiliary learning framework, we dynamically adjust and regularize the corresponding weights of the losses for these tasks during the training process. Experiments on MS COCO dataset are conducted using various detector combinations of teacher and student models and the results show that our proposed ReAL can achieve obvious improvement on different distillation model configurations, while performing favorably against state-of-the-arts.
在不牺牲推理时间的前提下平衡准确性和速度以获得更高性能,对于目标检测任务而言是一个具有挑战性的课题。知识蒸馏作为一种模型压缩技术,通过将复杂的教师检测器中的隐性知识转移到简单的学生检测器,为解决上述效率和有效性问题提供了一种潜在且可行的方法。尽管知识蒸馏展示了在准确性和速度之间取得平衡的有前景的解决方案,但当前用于目标检测的知识蒸馏方法仍存在两个局限性。首先,大多数方法是从图像分类任务的框架继承或借鉴而来,通过模仿或约束教师模型和学生模型中间层的特征或输出预测,以一种隐含的方式进行部署。然而,对于目标检测任务中分类和定位预测的内在相关性考虑甚少。此外,这些方法未能研究知识蒸馏流程中检测任务和蒸馏任务之间的关系,而是通过手工指定参数简单地整合这两个不同任务的损失来训练整个网络。为了解决上述问题,本文提出了一种新颖的用于目标检测的辅助学习关系知识蒸馏(ReAL)方法。具体而言,我们首先设计了一个预测关系蒸馏模块,使学生模型直接模仿教师模型的输出预测,并进行自关系和互关系蒸馏损失,以挖掘教师模型和学生模型之间的关系信息。此外,为了更好地深入研究蒸馏流程中不同任务之间的关系,我们将辅助学习引入目标检测的知识蒸馏,并开发了一种动态权重自适应策略。在辅助学习框架中,将检测任务视为主要任务,将蒸馏任务视为辅助任务,在训练过程中动态调整并规范这些任务损失的相应权重。使用教师模型和学生模型的各种检测器组合在MS COCO数据集上进行实验,结果表明,我们提出的ReAL在不同的蒸馏模型配置上都能取得显著改进,同时性能优于现有技术。