Suppr超能文献

交叉交互:用于3D感知的多模态交互与对齐策略

CrossInteraction: Multi-Modal Interaction and Alignment Strategy for 3D Perception.

作者信息

Zhao Weiyi, Liu Xinxin, Ding Yu

机构信息

College of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China.

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing 210044, China.

出版信息

Sensors (Basel). 2025 Sep 16;25(18):5775. doi: 10.3390/s25185775.

Abstract

Cameras and LiDAR are the primary sensors utilized in contemporary 3D object perception, leading to the development of various multi-modal detection algorithms for images, point clouds, and their fusion. Given the demanding accuracy requirements in autonomous driving environments, traditional multi-modal fusion techniques often overlook critical information from individual modalities and struggle to effectively align transformed features. In this paper, we introduce an improved modal interaction strategy, called CrossInteraction. This method enhances the interaction between modalities by using the output of the first modal representation as the input for the second interaction enhancement, resulting in better overall interaction effects. To further address the challenge of feature alignment errors, we employ a graph convolutional network. Finally, the prediction process is completed through a cross-attention mechanism, ensuring more accurate detection out- comes.

摘要

相机和激光雷达是当代3D目标感知中使用的主要传感器,这促使了针对图像、点云及其融合的各种多模态检测算法的发展。鉴于自动驾驶环境中对精度的苛刻要求,传统的多模态融合技术往往会忽略来自各个模态的关键信息,并且难以有效地对齐变换后的特征。在本文中,我们引入了一种改进的模态交互策略,称为交叉交互(CrossInteraction)。该方法通过将第一模态表示的输出用作第二次交互增强的输入来增强模态之间的交互,从而产生更好的整体交互效果。为了进一步解决特征对齐误差的挑战,我们采用了图卷积网络。最后,通过交叉注意力机制完成预测过程,确保更准确的检测结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8d3f/12473504/0ec9fa42026f/sensors-25-05775-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验