• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AFTR:一种基于自适应融合变压器的用于3D目标检测的鲁棒多传感器融合模型。

AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer.

作者信息

Zhang Yan, Liu Kang, Bao Hong, Qian Xu, Wang Zihan, Ye Shiqing, Wang Weicen

机构信息

School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing 100083, China.

College of Robotics, Beijing Union University, Beijing 100027, China.

出版信息

Sensors (Basel). 2023 Oct 12;23(20):8400. doi: 10.3390/s23208400.

DOI:10.3390/s23208400
PMID:37896496
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10611098/
Abstract

Multi-modal sensors are the key to ensuring the robust and accurate operation of autonomous driving systems, where LiDAR and cameras are important on-board sensors. However, current fusion methods face challenges due to inconsistent multi-sensor data representations and the misalignment of dynamic scenes. Specifically, current fusion methods either explicitly correlate multi-sensor data features by calibrating parameters, ignoring the feature blurring problems caused by misalignment, or find correlated features between multi-sensor data through global attention, causing rapidly escalating computational costs. On this basis, we propose a transformer-based end-to-end multi-sensor fusion framework named the adaptive fusion transformer (AFTR). The proposed AFTR consists of the adaptive spatial cross-attention (ASCA) mechanism and the spatial temporal self-attention (STSA) mechanism. Specifically, ASCA adaptively associates and interacts with multi-sensor data features in 3D space through learnable local attention, alleviating the problem of the misalignment of geometric information and reducing computational costs, and STSA interacts with cross-temporal information using learnable offsets in deformable attention, mitigating displacements due to dynamic scenes. We show through numerous experiments that the AFTR obtains SOTA performance in the nuScenes 3D object detection task (74.9% NDS and 73.2% mAP) and demonstrates strong robustness to misalignment (only a 0.2% NDS drop with slight noise). At the same time, we demonstrate the effectiveness of the AFTR components through ablation studies. In summary, the proposed AFTR is an accurate, efficient, and robust multi-sensor data fusion framework.

摘要

多模态传感器是确保自动驾驶系统稳健且准确运行的关键,其中激光雷达和摄像头是重要的车载传感器。然而,由于多传感器数据表示不一致以及动态场景的错位,当前的融合方法面临挑战。具体而言,当前的融合方法要么通过校准参数来明确关联多传感器数据特征,却忽略了由错位导致的特征模糊问题,要么通过全局注意力在多传感器数据之间找到相关特征,从而导致计算成本迅速攀升。在此基础上,我们提出了一种基于Transformer的端到端多传感器融合框架,名为自适应融合Transformer(AFTR)。所提出的AFTR由自适应空间交叉注意力(ASCA)机制和时空自注意力(STSA)机制组成。具体来说,ASCA通过可学习的局部注意力在3D空间中自适应地关联和交互多传感器数据特征,缓解几何信息错位问题并降低计算成本,而STSA在可变形注意力中使用可学习的偏移量与跨时间信息进行交互,减轻动态场景引起的位移。我们通过大量实验表明,AFTR在nuScenes 3D目标检测任务中获得了最优性能(74.9%的NDS和73.2%的mAP),并对错位表现出强大的鲁棒性(轻微噪声下NDS仅下降0.2%)。同时,我们通过消融研究证明了AFTR组件的有效性。总之,所提出的AFTR是一个准确、高效且稳健的多传感器数据融合框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/d85584959693/sensors-23-08400-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/1fb04fd87a5e/sensors-23-08400-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/f43798168b84/sensors-23-08400-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/ac926d0404de/sensors-23-08400-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/7eca22a7ec4a/sensors-23-08400-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/d85584959693/sensors-23-08400-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/1fb04fd87a5e/sensors-23-08400-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/f43798168b84/sensors-23-08400-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/ac926d0404de/sensors-23-08400-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/7eca22a7ec4a/sensors-23-08400-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8d4/10611098/d85584959693/sensors-23-08400-g005.jpg

相似文献

1
AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer.AFTR:一种基于自适应融合变压器的用于3D目标检测的鲁棒多传感器融合模型。
Sensors (Basel). 2023 Oct 12;23(20):8400. doi: 10.3390/s23208400.
2
BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera.BAFusion:基于激光雷达和摄像头的用于3D目标检测的双向注意力融合
Sensors (Basel). 2024 Jul 20;24(14):4718. doi: 10.3390/s24144718.
3
IRBEVF-Q: Optimization of Image-Radar Fusion Algorithm Based on Bird's Eye View Features.IRBEVF-Q:基于鸟瞰特征的图像-雷达融合算法优化
Sensors (Basel). 2024 Jul 16;24(14):4602. doi: 10.3390/s24144602.
4
PTA-Det: Point Transformer Associating Point Cloud and Image for 3D Object Detection.PTA-Det:用于 3D 目标检测的点变换关联点云和图像。
Sensors (Basel). 2023 Mar 17;23(6):3229. doi: 10.3390/s23063229.
5
CI3D: Context Interaction for Dynamic Objects and Static Map Elements in 3D Driving Scenes.CI3D:3D驾驶场景中动态物体与静态地图元素的上下文交互
IEEE Trans Image Process. 2024;33:2867-2879. doi: 10.1109/TIP.2023.3340607. Epub 2024 Apr 15.
6
Dense projection fusion for 3D object detection.用于3D目标检测的密集投影融合
Sci Rep. 2024 Oct 8;14(1):23492. doi: 10.1038/s41598-024-74679-9.
7
AEPF: Attention-Enabled Point Fusion for 3D Object Detection.AEPF:用于3D目标检测的注意力增强点融合
Sensors (Basel). 2024 Sep 9;24(17):5841. doi: 10.3390/s24175841.
8
EPMF: Efficient Perception-Aware Multi-Sensor Fusion for 3D Semantic Segmentation.EPMF:用于3D语义分割的高效感知感知多传感器融合
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8258-8273. doi: 10.1109/TPAMI.2024.3402232. Epub 2024 Nov 6.
9
AMFF-Net: An Effective 3D Object Detector Based on Attention and Multi-Scale Feature Fusion.AMFF-Net:一种基于注意力和多尺度特征融合的有效3D目标检测器。
Sensors (Basel). 2023 Nov 22;23(23):9319. doi: 10.3390/s23239319.
10
Graph-DETR4D: Spatio-Temporal Graph Modeling for Multi-View 3D Object Detection.Graph-DETR4D:用于多视图3D目标检测的时空图建模
IEEE Trans Image Process. 2024;33:4488-4500. doi: 10.1109/TIP.2024.3430473. Epub 2024 Aug 21.

引用本文的文献

1
Exploring the Unseen: A Survey of Multi-Sensor Fusion and the Role of Explainable AI (XAI) in Autonomous Vehicles.探索无形:多传感器融合概述及可解释人工智能(XAI)在自动驾驶车辆中的作用。
Sensors (Basel). 2025 Jan 31;25(3):856. doi: 10.3390/s25030856.
2
Glaucoma detection model by exploiting multi-region and multi-scan-pattern OCT images with dynamical region score.基于动态区域评分利用多区域和多扫描模式光学相干断层扫描(OCT)图像的青光眼检测模型
Biomed Opt Express. 2024 Feb 2;15(3):1370-1392. doi: 10.1364/BOE.512138. eCollection 2024 Mar 1.

本文引用的文献

1
Self-driving cars: A city perspective.自动驾驶汽车:城市视角。
Sci Robot. 2019 Mar 27;4(28). doi: 10.1126/scirobotics.aav9843.
2
Life and death decisions of autonomous vehicles.自动驾驶汽车的生死抉择。
Nature. 2020 Mar;579(7797):E1-E2. doi: 10.1038/s41586-020-1987-4. Epub 2020 Mar 4.
3
SECOND: Sparsely Embedded Convolutional Detection.第二:稀疏嵌入卷积检测。
Sensors (Basel). 2018 Oct 6;18(10):3337. doi: 10.3390/s18103337.