Suppr超能文献

用于激光雷达-相机3D目标检测的语义增强和时间精炼双向鸟瞰图融合

Semantic-Enhanced and Temporally Refined Bidirectional BEV Fusion for LiDAR-Camera 3D Object Detection.

作者信息

Qu Xiangjun, Qin Kai, Li Yaping, Zhang Shuaizhang, Li Yuchen, Shen Sizhe, Gao Yun

机构信息

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China.

School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China.

出版信息

J Imaging. 2025 Sep 18;11(9):319. doi: 10.3390/jimaging11090319.

Abstract

In domains such as autonomous driving, 3D object detection is a key technology for environmental perception. By integrating multimodal information from sensors such as LiDAR and cameras, the detection accuracy can be significantly improved. However, the current multimodal fusion perception framework still suffers from two problems: first, due to the inherent physical limitations of LiDAR detection, the number of point clouds of distant objects is sparse, resulting in small target objects being easily overwhelmed by the background; second, the cross-modal information interaction is insufficient, and the complementarity and correlation between the LiDAR point cloud and the camera image are not fully exploited and utilized. Therefore, we propose a new multimodal detection strategy, Semantic-Enhanced and Temporally Refined Bidirectional BEV Fusion (SETR-Fusion). This method integrates three key components: the Discriminative Semantic Saliency Activation (DSSA) module, the Temporally Consistent Semantic Point Fusion (TCSP) module, and the Bilateral Cross-Attention Fusion (BCAF) module. The DSSA module fully utilizes image semantic features to capture more discriminative foreground and background cues; the TCSP module generates semantic LiDAR points and, after noise filtering, produces a more accurate semantic LiDAR point cloud; and the BCAF module's cross-attention to camera and LiDAR BEV features in both directions enables strong interaction between the two types of modal information. SETR-Fusion achieves 71.2% mAP and 73.3% NDS values on the nuScenes test set, outperforming several state-of-the-art methods.

摘要

在自动驾驶等领域,3D目标检测是环境感知的关键技术。通过整合来自激光雷达和摄像头等传感器的多模态信息,可以显著提高检测精度。然而,当前的多模态融合感知框架仍存在两个问题:第一,由于激光雷达检测的固有物理限制,远处物体的点云数量稀疏,导致小目标物体容易被背景淹没;第二,跨模态信息交互不足,激光雷达点云和摄像头图像之间的互补性和相关性没有得到充分利用。因此,我们提出了一种新的多模态检测策略,即语义增强和时间细化双向鸟瞰图融合(SETR-Fusion)。该方法集成了三个关键组件:判别语义显著性激活(DSSA)模块、时间一致语义点融合(TCSP)模块和双边交叉注意力融合(BCAF)模块。DSSA模块充分利用图像语义特征来捕获更具判别力的前景和背景线索;TCSP模块生成语义激光雷达点,并在噪声过滤后产生更准确的语义激光雷达点云;BCAF模块对摄像头和激光雷达鸟瞰图特征进行双向交叉注意力,实现两种模态信息之间的强交互。SETR-Fusion在nuScenes测试集上实现了71.2%的平均精度均值(mAP)和73.3%的归一化检测分数(NDS)值,优于几种先进方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a3d4/12470275/b2cf80b0a621/jimaging-11-00319-g001.jpg

相似文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验