SMIFormer：通过多视图交互式变换器从4D成像雷达学习用于3D目标检测的空间特征表示

SMIFormer: Learning Spatial Feature Representation for 3D Object Detection from 4D Imaging Radar via Multi-View Interactive Transformers.

作者信息

Shi Weigang, Zhu Ziming, Zhang Kezhi, Chen Huanlei, Yu Zhuoping, Zhu Yu

机构信息

School of Automotive Studies, Tongji University, Shanghai 201804, China.

School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.

出版信息

Sensors (Basel). 2023 Nov 27;23(23):9429. doi: 10.3390/s23239429.

DOI:10.3390/s23239429

PMID:38067802

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10708838/

Abstract

4D millimeter wave (mmWave) imaging radar is a new type of vehicle sensor technology that is critical to autonomous driving systems due to its lower cost and robustness in complex weather. However, the sparseness and noise of point clouds are still the main problems restricting the practical application of 4D imaging radar. In this paper, we introduce SMIFormer, a multi-view feature fusion network framework based on 4D radar single-modal input. SMIFormer decouples the 3D point cloud scene into 3 independent but interrelated perspectives, including bird's-eye view (BEV), front view (FV), and side view (SV), thereby better modeling the entire 3D scene and overcoming the shortcomings of insufficient feature representation capabilities under single-view built from extremely sparse point clouds. For multi-view features, we proposed multi-view feature interaction (MVI) to exploit the inner relationship between different views by integrating features from intra-view interaction and cross-view interaction. We evaluated the proposed SMIFormer on the View-of-Delft (VoD) dataset. The mAP of our method reached 48.77 and 71.13 in the fully annotated area and the driving corridor area, respectively. This shows that 4D radar has great development potential in the field of 3D object detection.

摘要

4D毫米波成像雷达是一种新型的车辆传感器技术，因其成本较低且在复杂天气下具有鲁棒性，对自动驾驶系统至关重要。然而，点云的稀疏性和噪声仍然是限制4D成像雷达实际应用的主要问题。在本文中，我们介绍了SMIFormer，一种基于4D雷达单模态输入的多视图特征融合网络框架。SMIFormer将3D点云场景解耦为3个独立但相互关联的视角，包括鸟瞰视图（BEV）、前视图（FV）和侧视图（SV），从而更好地对整个3D场景进行建模，并克服了由极其稀疏的点云构建的单视图下特征表示能力不足的缺点。对于多视图特征，我们提出了多视图特征交互（MVI），通过整合来自视图内交互和跨视图交互的特征来利用不同视图之间的内在关系。我们在代尔夫特视图（VoD）数据集上对所提出的SMIFormer进行了评估。我们方法在完全标注区域和驾驶走廊区域的平均精度均值（mAP）分别达到了48.77和71.13。这表明4D雷达在3D目标检测领域具有巨大的发展潜力。