LGMMFusion：一种用于增强三维目标检测的激光雷达引导多模态融合框架。

LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection.

作者信息

Cheng Haixing, Liu Chengyong, Gu Wenzhe, Wu Yuyi, Zhao Mengye, Liu Wentao, Wang Naibang

机构信息

China Coal Energy Research Institute Co., Ltd., Xi'an, Shaanxi Province, China.

School of Mechanical and Electrical Engineering, China University of Mining and Technology (Beijing), Beijing, China.

出版信息

PLoS One. 2025 Sep 4;20(9):e0331195. doi: 10.1371/journal.pone.0331195. eCollection 2025.

DOI:10.1371/journal.pone.0331195

PMID:40906798

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12410713/

Abstract

Multi-modal data fusion plays a critical role in enhancing the accuracy and robustness of perception systems for autonomous driving, especially for the detection of small objects. However, small object detection remains particularly challenging due to sparse LiDAR points and low-resolution image features, which often lead to missed or imprecise detections. Currently, many methods process LiDAR point clouds and visible-light camera images separately, and then fuse them in the detection head. However, these approaches often fail to fully exploit the advantages of multi-modal sensors and overlook the potential for enhancing the correlation between modalities before feature fusion. To address this, we propose a novel LiDAR-guided multi-modal fusion framework for object detection, called LGMMfusion. This framework leverages the depth information from LiDAR to guide the generation of image Bird's Eye View (BEV) features. Specifically, LGMMfusion promotes spatial interaction between point clouds and pixels before the fusion of LiDAR BEV and image BEV features, enabling the generation of higher-quality image BEV features. To better align image and LiDAR features, we incorporate a multi-head multi-scale self-attention mechanism and a multi-head adaptive cross-attention mechanism, using the prior depth information from point clouds to generate image BEV features that better match the spatial positions of LiDAR BEV features. Finally, the LiDAR BEV features and image BEV features are fused to provide enhanced features for the detection head. Experimental results show that LGMMfusion achieves 71.1% NDS and 67.3% mAP on the nuScenes validation set, while also improving the detection of small objects and enhancing the detection accuracy of most objects.

摘要

多模态数据融合在提高自动驾驶感知系统的准确性和鲁棒性方面起着关键作用，特别是在小物体检测方面。然而，由于激光雷达点云稀疏和图像特征分辨率低，小物体检测仍然极具挑战性，这常常导致检测遗漏或不准确。目前，许多方法分别处理激光雷达点云和可见光相机图像，然后在检测头中进行融合。然而，这些方法往往无法充分利用多模态传感器的优势，并且在特征融合之前忽视了增强模态间相关性的潜力。为了解决这个问题，我们提出了一种用于物体检测的新型激光雷达引导多模态融合框架，称为LGMMfusion。该框架利用激光雷达的深度信息来指导图像鸟瞰图（BEV）特征的生成。具体而言，LGMMfusion在融合激光雷达BEV和图像BEV特征之前促进点云和像素之间的空间交互，从而能够生成更高质量的图像BEV特征。为了更好地对齐图像和激光雷达特征，我们引入了多头多尺度自注意力机制和多头自适应交叉注意力机制，利用点云的先验深度信息来生成与激光雷达BEV特征空间位置更匹配的图像BEV特征。最后，将激光雷达BEV特征和图像BEV特征融合，为检测头提供增强的特征。实验结果表明，LGMMfusion在nuScenes验证集上实现了71.1%的NDS和67.3%的mAP，同时还改善了小物体的检测并提高了大多数物体的检测精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b2b/12410713/b45743e25293/pone.0331195.g001.jpg

相似文献

LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection.

PLoS One. 2025 Sep 4;20(9):e0331195. doi: 10.1371/journal.pone.0331195. eCollection 2025.

PolarFusion: A multi-modal fusion algorithm for 3D object detection based on polar coordinates.

Neural Netw. 2025 Oct;190:107704. doi: 10.1016/j.neunet.2025.107704. Epub 2025 Jun 18.

BEVFix: Deep feature enhancement for robust 3D object detection.

Neural Netw. 2025 Oct;190:107675. doi: 10.1016/j.neunet.2025.107675. Epub 2025 Jun 6.

CIDRA-Net: Cross-modal interaction fusion network with distribution-relation awareness for robust 3D object detection.

Neural Netw. 2025 Nov;191:107818. doi: 10.1016/j.neunet.2025.107818. Epub 2025 Jul 5.

A long-term localization and mapping system for autonomous inspection robots in large-scale environments using 3D LiDAR sensors.

PLoS One. 2025 Jul 31;20(7):e0328169. doi: 10.1371/journal.pone.0328169. eCollection 2025.

LST-BEV: Generating a Long-Term Spatial-Temporal Bird's-Eye-View Feature for Multi-View 3D Object Detection.

Sensors (Basel). 2025 Jun 28;25(13):4040. doi: 10.3390/s25134040.

Prescription of Controlled Substances: Benefits and Risks

Mine-DW-Fusion: BEV Multiscale-Enhanced Fusion Object-Detection Model for Underground Coal Mine Based on Dynamic Weight Adjustment.

Sensors (Basel). 2025 Aug 20;25(16):5185. doi: 10.3390/s25165185.

Short-Term Memory Impairment

Singular Value Decomposition (SVD) Method for LiDAR and Camera Sensor Fusion and Pattern Matching Algorithm.

Sensors (Basel). 2025 Jun 21;25(13):3876. doi: 10.3390/s25133876.

本文引用的文献

Spatial-temporal characteristics and drivers of urban built-up areas land low-carbon efficiency in China.

Sci Rep. 2025 Jan 10;15(1):1623. doi: 10.1038/s41598-025-85808-3.

A framework of ecological security patterns in arid and semi-arid regions considering differences socioeconomic scenarios in ecological risk: Case of Loess Plateau, China.

J Environ Manage. 2025 Jan;373:123923. doi: 10.1016/j.jenvman.2024.123923. Epub 2024 Dec 29.

Vision-Centric BEV Perception: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10978-10997. doi: 10.1109/TPAMI.2024.3449912. Epub 2024 Nov 6.

Interpretable prediction, classification and regulation of water quality: A case study of Poyang Lake, China.

Sci Total Environ. 2024 Nov 15;951:175407. doi: 10.1016/j.scitotenv.2024.175407. Epub 2024 Aug 9.

Adaptive Multimodal Fusion With Attention Guided Deep Supervision Net for Grading Hepatocellular Carcinoma.

IEEE J Biomed Health Inform. 2022 Aug;26(8):4123-4131. doi: 10.1109/JBHI.2022.3161466. Epub 2022 Aug 11.

Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review.

IEEE Trans Neural Netw Learn Syst. 2021 Aug;32(8):3412-3432. doi: 10.1109/TNNLS.2020.3015992. Epub 2021 Aug 3.

A Survey on Deep Learning for Multimodal Data Fusion.

Neural Comput. 2020 May;32(5):829-864. doi: 10.1162/neco_a_01273. Epub 2020 Mar 18.

SECOND: Sparsely Embedded Convolutional Detection.

Sensors (Basel). 2018 Oct 6;18(10):3337. doi: 10.3390/s18103337.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

LGMMFusion：一种用于增强三维目标检测的激光雷达引导多模态融合框架。

LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection.

作者信息

Cheng Haixing, Liu Chengyong, Gu Wenzhe, Wu Yuyi, Zhao Mengye, Liu Wentao, Wang Naibang

机构信息

China Coal Energy Research Institute Co., Ltd., Xi'an, Shaanxi Province, China.

School of Mechanical and Electrical Engineering, China University of Mining and Technology (Beijing), Beijing, China.

出版信息

PLoS One. 2025 Sep 4;20(9):e0331195. doi: 10.1371/journal.pone.0331195. eCollection 2025.

DOI:10.1371/journal.pone.0331195

PMID:40906798

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12410713/

Abstract

摘要

LGMMFusion：一种用于增强三维目标检测的激光雷达引导多模态融合框架。

LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

LGMMFusion：一种用于增强三维目标检测的激光雷达引导多模态融合框架。

LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection.

作者信息

机构信息

出版信息

相似文献

本文引用的文献