• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Graph-DETR4D:用于多视图3D目标检测的时空图建模

Graph-DETR4D: Spatio-Temporal Graph Modeling for Multi-View 3D Object Detection.

作者信息

Chen Zehui, Chen Zheng, Li Zhenyu, Zhang Shiquan, Fang Liangji, Jiang Qinhong, Wu Feng, Zhao Feng

出版信息

IEEE Trans Image Process. 2024;33:4488-4500. doi: 10.1109/TIP.2024.3430473. Epub 2024 Aug 21.

DOI:10.1109/TIP.2024.3430473
PMID:39093681
Abstract

Multi-View 3D object detection (MV3D) has made tremendous progress by leveraging multiple perspective features through surrounding cameras. Despite demonstrating promising prospects in various applications, accurately detecting objects through camera view in the 3D space is extremely difficult due to the ill-posed issue in monocular depth estimation. Recently, Graph-DETR3D presents a novel graph-based 3D-2D query paradigm in aggregating multi-view images for 3D object detection and achieves competitive performance. Although it enriches the query representations with 2D image features through a learnable 3D graph, it still suffers from limited depth and velocity estimation abilities due to the adoption of a single-frame input setting. To solve this problem, we introduce a unified spatial-temporal graph modeling framework to fully leverage the multi-view imagery cues under the multi-frame inputs setting. Thanks to the flexibility and sparsity of the dynamic graph architecture, we lift the original 3D graph into the 4D space with an effective attention mechanism to automatically perceive imagery information at both spatial and temporal levels. Moreover, considering the main latency bottleneck lies in the image backbone, we propose a novel dense-sparse distillation framework for multi-view 3D object detection, to reduce the computational budget while sacrificing no detection accuracy, making it more suitable for real-world deployment. To this end, we propose Graph-DETR4D, a faster and stronger multi-view 3D object detection framework, built on top of Graph-DETR3D. Extensive experiments on nuScenes and Waymo benchmarks demonstrate the effectiveness and efficiency of Graph-DETR4D. Notably, our best model achieves 62.0% NDS on nuScenes test leaderboard. Code is available at https://github.com/zehuichen123/Graph-DETR4D.

摘要

多视图3D目标检测(MV3D)通过利用周围摄像头的多个视角特征取得了巨大进展。尽管在各种应用中展现出了广阔前景,但由于单目深度估计中的不适定问题,在3D空间中通过摄像头视图准确检测目标极其困难。最近,Graph-DETR3D提出了一种基于图的新颖3D-2D查询范式,用于聚合多视图图像进行3D目标检测,并取得了有竞争力的性能。尽管它通过可学习的3D图用2D图像特征丰富了查询表示,但由于采用单帧输入设置,其深度和速度估计能力仍然有限。为了解决这个问题,我们引入了一个统一的时空图建模框架,以在多帧输入设置下充分利用多视图图像线索。得益于动态图架构的灵活性和稀疏性,我们通过有效的注意力机制将原始3D图提升到4D空间,以自动在空间和时间层面感知图像信息。此外,考虑到主要的延迟瓶颈在于图像主干,我们提出了一种用于多视图3D目标检测的新颖密集-稀疏蒸馏框架,以在不牺牲检测精度的情况下减少计算量,使其更适合实际部署。为此,我们提出了Graph-DETR4D,这是一个基于Graph-DETR3D构建的更快更强的多视图3D目标检测框架。在nuScenes和Waymo基准上进行的大量实验证明了Graph-DETR4D的有效性和效率。值得注意的是,我们的最佳模型在nuScenes测试排行榜上达到了62.0% 的NDS。代码可在https://github.com/zehuichen123/Graph-DETR4D获取。

相似文献

1
Graph-DETR4D: Spatio-Temporal Graph Modeling for Multi-View 3D Object Detection.Graph-DETR4D:用于多视图3D目标检测的时空图建模
IEEE Trans Image Process. 2024;33:4488-4500. doi: 10.1109/TIP.2024.3430473. Epub 2024 Aug 21.
2
Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection.用于多帧3D目标检测的时空图增强DETR
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10614-10628. doi: 10.1109/TPAMI.2024.3443335. Epub 2024 Nov 6.
3
Monocular Quasi-Dense 3D Object Tracking.单目准密集三维物体跟踪
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1992-2008. doi: 10.1109/TPAMI.2022.3168781. Epub 2023 Jan 6.
4
CI3D: Context Interaction for Dynamic Objects and Static Map Elements in 3D Driving Scenes.CI3D:3D驾驶场景中动态物体与静态地图元素的上下文交互
IEEE Trans Image Process. 2024;33:2867-2879. doi: 10.1109/TIP.2023.3340607. Epub 2024 Apr 15.
5
Divide and Conquer: Improving Multi-Camera 3D Perception With 2D Semantic-Depth Priors and Input-Dependent Queries.分而治之:利用二维语义深度先验和输入相关查询改进多相机三维感知
IEEE Trans Image Process. 2024;33:897-909. doi: 10.1109/TIP.2024.3352808. Epub 2024 Jan 23.
6
OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection.OBMO:用于单目3D目标检测的一个边界框多个目标
IEEE Trans Image Process. 2023 Nov 21;PP. doi: 10.1109/TIP.2023.3333225.
7
Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency.基于时空视图一致性的弱监督单目3D目标检测
IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):84-98. doi: 10.1109/TPAMI.2024.3466915. Epub 2024 Dec 4.
8
Fully Sparse Fusion for 3D Object Detection.用于3D目标检测的全稀疏融合
IEEE Trans Pattern Anal Mach Intell. 2024 Nov;46(11):7217-7231. doi: 10.1109/TPAMI.2024.3392303. Epub 2024 Oct 3.
9
Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection From Point Clouds.基于图神经网络和时空变换注意力机制的点云三维视频目标检测。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9822-9835. doi: 10.1109/TPAMI.2021.3125981. Epub 2023 Jun 30.
10
AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer.AFTR:一种基于自适应融合变压器的用于3D目标检测的鲁棒多传感器融合模型。
Sensors (Basel). 2023 Oct 12;23(20):8400. doi: 10.3390/s23208400.