基于 Transformer 的目标检测知识融合

Knowledge Amalgamation for Object Detection With Transformers.

出版信息

IEEE Trans Image Process. 2023;32:2093-2106. doi: 10.1109/TIP.2023.3263105.

DOI:10.1109/TIP.2023.3263105

Abstract

Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that Transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to Transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for Transformer-based object detection models. Specifically, considering the architecture characteristics of Transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA approaches. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the Transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.

摘要

知识融合（KA）是一种新颖的深度模型重用任务，旨在将知识从多个训练有素的教师转移到一个多才多艺且紧凑的学生中。目前，这些方法大多针对卷积神经网络（CNNs）进行了定制。然而，情况是，具有完全不同架构的 Transformer 开始在许多计算机视觉任务中挑战 CNN 的主导地位。尽管如此，直接将之前的 KA 方法应用于 Transformer 会导致性能严重下降。在这项工作中，我们探索了一种更有效的基于 Transformer 的目标检测模型的 KA 方案。具体来说，考虑到 Transformer 的架构特点，我们提出将 KA 分解为两个方面：序列级融合（SA）和任务级融合（TA）。具体来说，序列级融合通过串联教师序列而不是像之前的 KA 方法那样将它们冗余地聚合到一个固定大小的序列中来生成提示。此外，学生通过具有效率的软目标在任务级融合中学习异构检测任务。在 PASCAL VOC 和 COCO 上进行的广泛实验表明，序列级融合显著提高了学生的性能，而之前的方法则损害了学生的性能。此外，基于 Transformer 的学生在学习融合知识方面表现出色，因为他们能够快速掌握异构检测任务，并在专门领域中实现优于或至少可与教师相媲美的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于 Transformer 的目标检测知识融合

Knowledge Amalgamation for Object Detection With Transformers.

出版信息

相似文献

基于 Transformer 的目标检测知识融合

Knowledge Amalgamation for Object Detection With Transformers.

出版信息

相似文献