• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于点云语义分割的跨模态注意力驱动多传感器融合方法。

A Cross-Modal Attention-Driven Multi-Sensor Fusion Method for Semantic Segmentation of Point Clouds.

作者信息

Shi Huisheng, Wang Xin, Zhao Jianghong, Hua Xinnan

机构信息

Department of Remote Sensing Engineering, Henan College of Surveying and Mapping, Zhengzhou 451464, China.

Beiqi Foton Motor Co., Ltd., Beijing 102206, China.

出版信息

Sensors (Basel). 2025 Apr 14;25(8):2474. doi: 10.3390/s25082474.

DOI:10.3390/s25082474
PMID:40285164
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12031233/
Abstract

To bridge the modality gap between camera images and LiDAR point clouds in autonomous driving systems-a critical challenge exacerbated by current fusion methods' inability to effectively integrate cross-modal features-we propose the Cross-Modal Fusion (CMF) framework. This attention-driven architecture enables hierarchical multi-sensor data fusion, achieving state-of-the-art performance in semantic segmentation tasks.The CMF framework first projects point clouds onto the camera coordinates through the use of perspective projection to provide spatio-depth information for RGB images. Then, a two-stream feature extraction network is proposed to extract features from the two modalities separately, and multilevel fusion of the two modalities is realized by a residual fusion module (RCF) with cross-modal attention. Finally, we design a perceptual alignment loss that integrates cross-entropy with feature matching terms, effectively minimizing the semantic discrepancy between camera and LiDAR representations during fusion. The experimental results based on the SemanticKITTI and nuScenes benchmark datasets demonstrate that the CMF method achieves mean intersection over union (mIoU) scores of 64.2% and 79.3%, respectively, outperforming existing state-of-the-art methods in regard to accuracy and exhibiting enhanced robustness in regard to complex scenarios. The results of the ablation studies further validate that enhancing the feature interaction and fusion capabilities in semantic segmentation models through cross-modal attention and perceptually guided cross-entropy loss (Pgce) is effective in regard to improving segmentation accuracy and robustness.

摘要

为了弥合自动驾驶系统中相机图像和激光雷达点云之间的模态差距(当前融合方法无法有效整合跨模态特征,这一关键挑战进一步加剧),我们提出了跨模态融合(CMF)框架。这种注意力驱动的架构实现了分层多传感器数据融合,在语义分割任务中达到了当前最优的性能。CMF框架首先通过透视投影将点云投影到相机坐标上,为RGB图像提供空间深度信息。然后,提出了一个双流特征提取网络,分别从两种模态中提取特征,并通过带有跨模态注意力的残差融合模块(RCF)实现两种模态的多级融合。最后,我们设计了一种感知对齐损失,将交叉熵与特征匹配项相结合,有效地最小化了融合过程中相机和激光雷达表示之间的语义差异。基于SemanticKITTI和nuScenes基准数据集的实验结果表明,CMF方法分别实现了64.2%和79.3%的平均交并比(mIoU)分数,在准确性方面优于现有的最优方法,并且在复杂场景中表现出更强的鲁棒性。消融研究的结果进一步验证了通过跨模态注意力和感知引导的交叉熵损失(Pgce)增强语义分割模型中的特征交互和融合能力,对于提高分割准确性和鲁棒性是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/30c1bbbc8ae3/sensors-25-02474-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/db74ed69c590/sensors-25-02474-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/a39b1ebf0158/sensors-25-02474-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/71856231a78c/sensors-25-02474-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/ffba7906282d/sensors-25-02474-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/c385e5b253f8/sensors-25-02474-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/2be5f5fa2611/sensors-25-02474-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/30c1bbbc8ae3/sensors-25-02474-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/db74ed69c590/sensors-25-02474-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/a39b1ebf0158/sensors-25-02474-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/71856231a78c/sensors-25-02474-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/ffba7906282d/sensors-25-02474-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/c385e5b253f8/sensors-25-02474-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/2be5f5fa2611/sensors-25-02474-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36b8/12031233/30c1bbbc8ae3/sensors-25-02474-g007.jpg

相似文献

1
A Cross-Modal Attention-Driven Multi-Sensor Fusion Method for Semantic Segmentation of Point Clouds.一种用于点云语义分割的跨模态注意力驱动多传感器融合方法。
Sensors (Basel). 2025 Apr 14;25(8):2474. doi: 10.3390/s25082474.
2
EPMF: Efficient Perception-Aware Multi-Sensor Fusion for 3D Semantic Segmentation.EPMF:用于3D语义分割的高效感知感知多传感器融合
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8258-8273. doi: 10.1109/TPAMI.2024.3402232. Epub 2024 Nov 6.
3
Uni-to-Multi Modal Knowledge Distillation for Bidirectional LiDAR-Camera Semantic Segmentation.用于双向激光雷达-相机语义分割的单模态到多模态知识蒸馏
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11059-11072. doi: 10.1109/TPAMI.2024.3451658. Epub 2024 Nov 6.
4
FGCN: Image-Fused Point Cloud Semantic Segmentation with Fusion Graph Convolutional Network.FGCN:基于融合图卷积网络的图像融合点云语义分割
Sensors (Basel). 2023 Oct 9;23(19):8338. doi: 10.3390/s23198338.
5
A 3D hierarchical cross-modality interaction network using transformers and convolutions for brain glioma segmentation in MR images.一种使用变换和卷积的 3D 层次跨模态交互网络,用于磁共振图像中的脑胶质瘤分割。
Med Phys. 2024 Nov;51(11):8371-8389. doi: 10.1002/mp.17354. Epub 2024 Aug 13.
6
Multi-scale sparse convolution and point convolution adaptive fusion point cloud semantic segmentation method.多尺度稀疏卷积与点卷积自适应融合的点云语义分割方法
Sci Rep. 2025 Feb 5;15(1):4372. doi: 10.1038/s41598-025-88905-5.
7
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
8
An Efficient Ensemble Deep Learning Approach for Semantic Point Cloud Segmentation Based on 3D Geometric Features and Range Images.一种基于3D几何特征和距离图像的高效集成深度学习语义点云分割方法。
Sensors (Basel). 2022 Aug 18;22(16):6210. doi: 10.3390/s22166210.
9
CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation.CMAF-Net:一种基于跨模态注意力融合的深度神经网络,用于不完全多模态脑肿瘤分割。
Quant Imaging Med Surg. 2024 Jul 1;14(7):4579-4604. doi: 10.21037/qims-24-9. Epub 2024 Jun 27.
10
Point Cloud Semantic Segmentation Network Based on Multi-Scale Feature Fusion.基于多尺度特征融合的点云语义分割网络
Sensors (Basel). 2021 Feb 26;21(5):1625. doi: 10.3390/s21051625.

本文引用的文献

1
GLE-net: global-local information enhancement for semantic segmentation of remote sensing images.GLE-net:用于遥感图像语义分割的全局-局部信息增强
Sci Rep. 2024 Oct 25;14(1):25282. doi: 10.1038/s41598-024-76622-4.
2
A Survey of Label-Efficient Deep Learning for 3D Point Clouds.三维点云的标签高效深度学习综述
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):9139-9160. doi: 10.1109/TPAMI.2024.3416302. Epub 2024 Nov 6.
3
Multi-supervised bidirectional fusion network for road-surface condition recognition.用于路面状况识别的多监督双向融合网络。
PeerJ Comput Sci. 2023 Aug 17;9:e1446. doi: 10.7717/peerj-cs.1446. eCollection 2023.
4
YOLOv5s-Fog: An Improved Model Based on YOLOv5s for Object Detection in Foggy Weather Scenarios.YOLOv5s-Fog:一种基于 YOLOv5s 的改进模型,用于雾天场景中的目标检测。
Sensors (Basel). 2023 Jun 3;23(11):5321. doi: 10.3390/s23115321.
5
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.DeepLab:基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
6
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.SegNet:一种用于图像分割的深度卷积编解码器架构。
IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.
7
Fully Convolutional Networks for Semantic Segmentation.全卷积网络用于语义分割。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.