Suppr超能文献

用于多帧3D目标检测的时空图增强DETR

Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection.

作者信息

Zhang Yifan, Zhu Zhiyu, Hou Junhui, Wu Dapeng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10614-10628. doi: 10.1109/TPAMI.2024.3443335. Epub 2024 Nov 6.

Abstract

The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In this paper, we present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection by addressing three key aspects specifically tailored for this task. First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network, which represents queries as nodes in a graph and enables effective modeling of object interactions within a social context. To solve the problem of missing hard cases in the proposed output of the encoder in the current frame, we incorporate the output of the previous frame to initialize the query input of the decoder. Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match. And similar queries are insufficiently suppressed and turn into redundant prediction boxes. To address this issue, our proposed IoU regularization term encourages similar queries to be distinct during the refinement. Through extensive experiments, we demonstrate the effectiveness of our approach in handling challenging scenarios, while incurring only a minor additional computational overhead.

摘要

检测变压器(DETR)彻底改变了基于卷积神经网络(CNN)的目标检测系统的设计,展现出令人印象深刻的性能。然而,其在多帧三维目标检测领域的潜力在很大程度上仍未得到探索。在本文中,我们提出了STEMD,这是一种新颖的端到端框架,通过专门针对此任务解决三个关键方面,增强了用于多帧三维目标检测的类DETR范式。首先,为了对物体间的空间交互和复杂的时间依赖性进行建模,我们引入了时空图注意力网络,该网络将查询表示为图中的节点,并能够在社交背景下有效地对物体交互进行建模。为了解决当前帧中编码器提议输出中遗漏硬实例的问题,我们合并前一帧的输出以初始化解码器的查询输入。最后,网络要区分正查询和其他并非最佳匹配的高度相似查询存在挑战。并且相似查询未得到充分抑制,会变成冗余预测框。为解决此问题,我们提出的交并比(IoU)正则化项鼓励相似查询在细化过程中变得不同。通过大量实验,我们证明了我们的方法在处理具有挑战性的场景时的有效性,同时仅产生少量额外的计算开销。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验