Suppr超能文献

AFTR:一种基于自适应融合变压器的用于3D目标检测的鲁棒多传感器融合模型。

AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer.

作者信息

Zhang Yan, Liu Kang, Bao Hong, Qian Xu, Wang Zihan, Ye Shiqing, Wang Weicen

机构信息

School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing 100083, China.

College of Robotics, Beijing Union University, Beijing 100027, China.

出版信息

Sensors (Basel). 2023 Oct 12;23(20):8400. doi: 10.3390/s23208400.

Abstract

Multi-modal sensors are the key to ensuring the robust and accurate operation of autonomous driving systems, where LiDAR and cameras are important on-board sensors. However, current fusion methods face challenges due to inconsistent multi-sensor data representations and the misalignment of dynamic scenes. Specifically, current fusion methods either explicitly correlate multi-sensor data features by calibrating parameters, ignoring the feature blurring problems caused by misalignment, or find correlated features between multi-sensor data through global attention, causing rapidly escalating computational costs. On this basis, we propose a transformer-based end-to-end multi-sensor fusion framework named the adaptive fusion transformer (AFTR). The proposed AFTR consists of the adaptive spatial cross-attention (ASCA) mechanism and the spatial temporal self-attention (STSA) mechanism. Specifically, ASCA adaptively associates and interacts with multi-sensor data features in 3D space through learnable local attention, alleviating the problem of the misalignment of geometric information and reducing computational costs, and STSA interacts with cross-temporal information using learnable offsets in deformable attention, mitigating displacements due to dynamic scenes. We show through numerous experiments that the AFTR obtains SOTA performance in the nuScenes 3D object detection task (74.9% NDS and 73.2% mAP) and demonstrates strong robustness to misalignment (only a 0.2% NDS drop with slight noise). At the same time, we demonstrate the effectiveness of the AFTR components through ablation studies. In summary, the proposed AFTR is an accurate, efficient, and robust multi-sensor data fusion framework.

摘要

多模态传感器是确保自动驾驶系统稳健且准确运行的关键,其中激光雷达和摄像头是重要的车载传感器。然而,由于多传感器数据表示不一致以及动态场景的错位,当前的融合方法面临挑战。具体而言,当前的融合方法要么通过校准参数来明确关联多传感器数据特征,却忽略了由错位导致的特征模糊问题,要么通过全局注意力在多传感器数据之间找到相关特征,从而导致计算成本迅速攀升。在此基础上,我们提出了一种基于Transformer的端到端多传感器融合框架,名为自适应融合Transformer(AFTR)。所提出的AFTR由自适应空间交叉注意力(ASCA)机制和时空自注意力(STSA)机制组成。具体来说,ASCA通过可学习的局部注意力在3D空间中自适应地关联和交互多传感器数据特征,缓解几何信息错位问题并降低计算成本,而STSA在可变形注意力中使用可学习的偏移量与跨时间信息进行交互,减轻动态场景引起的位移。我们通过大量实验表明,AFTR在nuScenes 3D目标检测任务中获得了最优性能(74.9%的NDS和73.2%的mAP),并对错位表现出强大的鲁棒性(轻微噪声下NDS仅下降0.2%)。同时,我们通过消融研究证明了AFTR组件的有效性。总之,所提出的AFTR是一个准确、高效且稳健的多传感器数据融合框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/1fb04fd87a5e/sensors-23-08400-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验