Suppr超能文献

EPMF:用于3D语义分割的高效感知感知多传感器融合

EPMF: Efficient Perception-Aware Multi-Sensor Fusion for 3D Semantic Segmentation.

作者信息

Tan Mingkui, Zhuang Zhuangwei, Chen Sitao, Li Rong, Jia Kui, Wang Qicheng, Li Yuanqing

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8258-8273. doi: 10.1109/TPAMI.2024.3402232. Epub 2024 Nov 6.

Abstract

We study multi-sensor fusion for 3D semantic segmentation that is important to scene understanding for many applications, such as autonomous driving and robotics. Existing fusion-based methods, however, may not achieve promising performance due to the vast difference between the two modalities. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to effectively exploit perceptual information from two modalities, namely, appearance information from RGB images and spatio-depth information from point clouds. To this end, we project point clouds to the camera coordinate using perspective projection, and process both inputs from LiDAR and cameras in 2D space while preventing the information loss of RGB images. Then, we propose a two-stream network to extract features from the two modalities, separately. The extracted features are fused by effective residual-based fusion modules. Moreover, we introduce additional perception-aware losses to measure the perceptual difference between the two modalities. Last, we propose an improved version of PMF, i.e., EPMF, which is more efficient and effective by optimizing data pre-processing and network architecture under perspective projection. Specifically, we propose cross-modal alignment and cropping to obtain tight inputs and reduce unnecessary computational costs. We then explore more efficient contextual modules under perspective projection and fuse the LiDAR features into the camera stream to boost the performance of the two-stream network. Extensive experiments on benchmark data sets show the superiority of our method. For example, on nuScenes test set, our EPMF outperforms the state-of-the-art method, i.e., RangeFormer, by 0.9% in mIoU.

摘要

我们研究用于3D语义分割的多传感器融合,这对于许多应用(如自动驾驶和机器人技术)中的场景理解至关重要。然而,由于两种模态之间存在巨大差异,现有的基于融合的方法可能无法取得理想的性能。在这项工作中,我们研究了一种名为感知感知多传感器融合(PMF)的协作融合方案,以有效利用来自两种模态的感知信息,即来自RGB图像的外观信息和来自点云的空间深度信息。为此,我们使用透视投影将点云投影到相机坐标,并在二维空间中处理来自激光雷达和相机的输入,同时防止RGB图像的信息丢失。然后,我们提出了一个双流网络,分别从两种模态中提取特征。提取的特征通过有效的基于残差的融合模块进行融合。此外,我们引入了额外的感知感知损失来衡量两种模态之间的感知差异。最后,我们提出了PMF的改进版本,即EPMF,通过在透视投影下优化数据预处理和网络架构,使其更高效且有效。具体来说,我们提出了跨模态对齐和裁剪,以获得紧凑的输入并减少不必要的计算成本。然后,我们在透视投影下探索更高效的上下文模块,并将激光雷达特征融合到相机流中,以提高双流网络的性能。在基准数据集上进行的大量实验表明了我们方法的优越性。例如,在nuScenes测试集上,我们的EPMF在mIoU方面比最先进的方法RangeFormer高出0.9%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验