DetectFormer：用于交通场景目标检测的类别辅助式 Transformer。

DetectFormer: Category-Assisted Transformer for Traffic Scene Object Detection.

机构信息

Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China.

College of Robotics, Beijing Union University, Beijing 100101, China.

出版信息

Sensors (Basel). 2022 Jun 26;22(13):4833. doi: 10.3390/s22134833.

DOI:10.3390/s22134833

PMID:35808332

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9268794/

Abstract

Object detection plays a vital role in autonomous driving systems, and the accurate detection of surrounding objects can ensure the safe driving of vehicles. This paper proposes a category-assisted transformer object detector called DetectFormer for autonomous driving. The proposed object detector can achieve better accuracy compared with the baseline. Specifically, ClassDecoder is assisted by proposal categories and global information from the Global Extract Encoder (GEE) to improve the category sensitivity and detection performance. This fits the distribution of object categories in specific scene backgrounds and the connection between objects and the image context. Data augmentation is used to improve robustness and attention mechanism added in backbone network to extract channel-wise spatial features and direction information. The results obtained by benchmark experiment reveal that the proposed method can achieve higher real-time detection performance in traffic scenes compared with RetinaNet and FCOS. The proposed method achieved a detection performance of 97.6% and 91.4% in AP50 and AP75 on the BCTSDB dataset, respectively.

摘要

目标检测在自动驾驶系统中起着至关重要的作用，准确地检测周围的物体可以确保车辆的安全行驶。本文提出了一种名为 DetectFormer 的用于自动驾驶的类别辅助式变形金刚目标检测器。与基线相比，所提出的目标检测器可以实现更高的精度。具体来说，ClassDecoder 借助提议类别和来自 GlobalExtractEncoder（GEE）的全局信息来提高类别灵敏度和检测性能。这符合特定场景背景下的物体类别分布以及物体与图像上下文之间的连接。通过数据增强来提高鲁棒性，并在骨干网络中添加注意力机制来提取通道间的空间特征和方向信息。基准实验的结果表明，与 RetinaNet 和 FCOS 相比，所提出的方法可以在交通场景中实现更高的实时检测性能。在所提出的方法中，在 BCTSDB 数据集上分别实现了 97.6%和 91.4%的 AP50 和 AP75 检测性能。