Suppr超能文献

用于3D目标检测的密集投影融合

Dense projection fusion for 3D object detection.

作者信息

Chen Zhao, Hu Bin-Jie, Luo Chengxi, Chen Guohao, Zhu Haohui

机构信息

School of Electronic and Information Engineering, South China University of Technology, Guangzhou, 510640, China.

出版信息

Sci Rep. 2024 Oct 8;14(1):23492. doi: 10.1038/s41598-024-74679-9.

Abstract

Fusing information from LiDAR and cameras can effectively enhance the overall perceptivity of autonomous vehicles in various scenarios. Despite the relatively good results achieved by point-wise fusion and Bird's-Eye-View (BEV) fusion, they still cannot fully leverage the image information and lack of effective depth information. For any fusion methods, the multi-modal features first need to be concatenated along the channel, and then the fused features are extracted using convolutional layers. However, this type of fusion methods is effective, but too coarse which causes that the fused features cannot pay more attention to the regions with important features and suffer from severe noise. To tackle these issues, we propose in this paper a Dense Projection Fusion (DPFusion) approach. It consists of two new modules: dense depth map guided BEV transform (DGBT) module and multi-modal feature adaptive fusion (MFAF) module. The DGBT module first quickly estimates the depth of each pixel and then projects all image features to the BEV space, making full use of the image information. The MFAF module computes the image weights and point cloud weights for each channel in each BEV grid and then adaptively weights and fuses the image BEV features with the point cloud BEV features. It is worth pointing out that the MFAF module makes the fused features pay more attention to background outline and object outline. Our proposed DPFusion demonstrates competitive results in 3D object detection, achieving a mean Average Precision (mAP) of 70.4 and a nuScenes detection score (NDS) of 72.3 on the nuScenes validation set.

摘要

融合来自激光雷达和摄像头的信息可以有效提高自动驾驶车辆在各种场景下的整体感知能力。尽管逐点融合和鸟瞰图(BEV)融合取得了相对较好的结果,但它们仍然无法充分利用图像信息,并且缺乏有效的深度信息。对于任何融合方法,多模态特征首先需要沿通道拼接,然后使用卷积层提取融合特征。然而,这种类型的融合方法虽然有效,但过于粗糙,导致融合特征无法更多地关注具有重要特征的区域,并且受到严重噪声的影响。为了解决这些问题,我们在本文中提出了一种密集投影融合(DPFusion)方法。它由两个新模块组成:密集深度图引导的BEV变换(DGBT)模块和多模态特征自适应融合(MFAF)模块。DGBT模块首先快速估计每个像素的深度,然后将所有图像特征投影到BEV空间,充分利用图像信息。MFAF模块计算每个BEV网格中每个通道的图像权重和点云权重,然后自适应地加权并融合图像BEV特征和点云BEV特征。值得指出的是,MFAF模块使融合特征更加关注背景轮廓和物体轮廓。我们提出的DPFusion在3D目标检测中展示了具有竞争力的结果,在nuScenes验证集上实现了70.4的平均精度均值(mAP)和72.3的nuScenes检测分数(NDS)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cef/11461888/2c5718015f26/41598_2024_74679_Fig1_HTML.jpg

相似文献

1
Dense projection fusion for 3D object detection.用于3D目标检测的密集投影融合
Sci Rep. 2024 Oct 8;14(1):23492. doi: 10.1038/s41598-024-74679-9.
5
Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline.Fast-BEV:一种快速且强大的鸟瞰视角感知基线。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8665-8679. doi: 10.1109/TPAMI.2024.3414835. Epub 2024 Nov 6.
9
BiFNet: Bidirectional Fusion Network for Road Segmentation.BiFNet:用于道路分割的双向融合网络。
IEEE Trans Cybern. 2022 Sep;52(9):8617-8628. doi: 10.1109/TCYB.2021.3105488. Epub 2022 Aug 18.
10
Fully Sparse Fusion for 3D Object Detection.用于3D目标检测的全稀疏融合
IEEE Trans Pattern Anal Mach Intell. 2024 Nov;46(11):7217-7231. doi: 10.1109/TPAMI.2024.3392303. Epub 2024 Oct 3.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验