CIDRA-Net：具有分布关系感知的跨模态交互融合网络，用于稳健的3D目标检测。

CIDRA-Net: Cross-modal interaction fusion network with distribution-relation awareness for robust 3D object detection.

作者信息

Yu Hongqi, Zhang Xiaoqin, Zhou Xiaolong, Chan Sixian

机构信息

Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, Zhejiang, China.

The College of Electrical and Information Engineering, Quzhou University, Quzhou, 324000, Zhejiang, China.

出版信息

Neural Netw. 2025 Nov;191:107818. doi: 10.1016/j.neunet.2025.107818. Epub 2025 Jul 5.

DOI:10.1016/j.neunet.2025.107818

PMID:40644997

Abstract

3D object detection is crucial for autonomous driving, enabling accurate object classification and localization in the real world. Existing methods typically rely on basic element-wise operations to fuse multi-modal features from point clouds and images, limiting the effective learning of camera semantics and LiDAR spatial information. Additionally, the inherent sparsity of point clouds leads to distribution imbalances in receptive fields, and the complexity of 3D objects conceals implicit relational contexts. To address these limitations, we propose CIDRA-Net, a cross-modal interaction fusion network with distribution-relation awareness. First, we introduce a region cross-modal interaction fusion (RCIF) module that combines LiDAR features with camera depth information through dual-modal attention. We then separate and enhance two distribution-level features using a dual-branch distribution perception (DBDP) module to learn point distributions. Additionally, a global-local relation mining (GLRM) strategy is employed to capture both local and global contextual information for better object understanding and refined regression tasks. Our approach achieves state-of-the-art performance on the nuScenes and KITTI benchmarks while demonstrating strong generalization across backbones and robustness against sensor errors.

摘要

3D目标检测对于自动驾驶至关重要，它能够在现实世界中实现准确的目标分类和定位。现有方法通常依赖于基本的逐元素操作来融合来自点云和图像的多模态特征，这限制了对相机语义和激光雷达空间信息的有效学习。此外，点云固有的稀疏性导致感受野中的分布不平衡，并且3D物体的复杂性掩盖了隐含的关系上下文。为了解决这些限制，我们提出了CIDRA-Net，一种具有分布关系感知的跨模态交互融合网络。首先，我们引入了一个区域跨模态交互融合（RCIF）模块，该模块通过双模态注意力将激光雷达特征与相机深度信息相结合。然后，我们使用双分支分布感知（DBDP）模块分离并增强两个分布级特征以学习点分布。此外，采用全局-局部关系挖掘（GLRM）策略来捕获局部和全局上下文信息，以更好地理解物体并完成精细化回归任务。我们的方法在nuScenes和KITTI基准测试中取得了领先的性能，同时在不同骨干网络上展现出强大的泛化能力以及对传感器误差的鲁棒性。

相似文献

CIDRA-Net: Cross-modal interaction fusion network with distribution-relation awareness for robust 3D object detection.

Neural Netw. 2025 Nov;191:107818. doi: 10.1016/j.neunet.2025.107818. Epub 2025 Jul 5.

LGMMFusion: A LiDAR-guided multi-modal fusion framework for enhanced 3D object detection.

PLoS One. 2025 Sep 4;20(9):e0331195. doi: 10.1371/journal.pone.0331195. eCollection 2025.

PolarFusion: A multi-modal fusion algorithm for 3D object detection based on polar coordinates.

Neural Netw. 2025 Oct;190:107704. doi: 10.1016/j.neunet.2025.107704. Epub 2025 Jun 18.

Multi-level channel-spatial attention and light-weight scale-fusion network (MCSLF-Net): multi-level channel-spatial attention and light-weight scale-fusion transformer for 3D brain tumor segmentation.

Quant Imaging Med Surg. 2025 Jul 1;15(7):6301-6325. doi: 10.21037/qims-2025-354. Epub 2025 Jun 30.

CBG-Net: Cross-modality and cross-scale balance network with global semantics for multi-modal 3D object detection.

Neural Netw. 2024 Nov;179:106535. doi: 10.1016/j.neunet.2024.106535. Epub 2024 Jul 14.

A long-term localization and mapping system for autonomous inspection robots in large-scale environments using 3D LiDAR sensors.

PLoS One. 2025 Jul 31;20(7):e0328169. doi: 10.1371/journal.pone.0328169. eCollection 2025.

SG-Fusion: A swin-transformer and graph convolution-based multi-modal deep neural network for glioma prognosis.

Artif Intell Med. 2024 Nov;157:102972. doi: 10.1016/j.artmed.2024.102972. Epub 2024 Aug 31.

Learning spatio-temporal representation for cooperative 3D object detection and tracking.

Neural Netw. 2025 Oct;190:107626. doi: 10.1016/j.neunet.2025.107626. Epub 2025 May 29.

BEVFix: Deep feature enhancement for robust 3D object detection.

Neural Netw. 2025 Oct;190:107675. doi: 10.1016/j.neunet.2025.107675. Epub 2025 Jun 6.

A Modality Alignment and Fusion-Based Method for Around-the-Clock Remote Sensing Object Detection.

Sensors (Basel). 2025 Aug 11;25(16):4964. doi: 10.3390/s25164964.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CIDRA-Net：具有分布关系感知的跨模态交互融合网络，用于稳健的3D目标检测。

CIDRA-Net: Cross-modal interaction fusion network with distribution-relation awareness for robust 3D object detection.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献