基于注意力机制的多帧点云特征融合用于3D目标检测

Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection.

作者信息

Zhai Zhenyu, Wang Qiantong, Pan Zongxu, Gao Zhentong, Hu Wenlong

机构信息

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China.

Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Sensors (Basel). 2022 Oct 2;22(19):7473. doi: 10.3390/s22197473.

DOI:10.3390/s22197473

PMID:36236572

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9570913/

Abstract

Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP.

摘要

基于点云的连续帧目标检测是一个新的研究方向。目前，大多数研究采用基于拼接的方法融合多帧点云。该方法利用GPS、IMU等信息对齐不同帧。然而，这种融合方法只能对齐静态物体，无法对齐移动物体。在本文中，我们提出了一种基于非局部的多尺度特征融合方法，该方法无需基于GPS和IMU的配准即可处理移动和静态物体。考虑到非局部方法消耗资源，我们基于点云的稀疏性提出了一种新颖的简化非局部块。通过过滤掉空单元，内存消耗降低了99.93%。此外，采用三重注意力来增强目标上的关键信息并抑制背景噪声，进一步有利于基于非局部的特征融合方法。最后，我们基于PointPillars和CenterPoint验证了该方法。实验结果表明，与基于拼接的融合模块PointPillars-2和CenterPoint-2相比，所提方法的平均精度均值（mAP）分别提高了3.9%和4.1%。此外，所提网络在mAP上比强大的3D-VID高出1.2%。