Suppr超能文献

SODFormer:基于事件和帧的使用Transformer的流式目标检测

SODFormer: Streaming Object Detection With Transformer Using Events and Frames.

作者信息

Li Dianze, Tian Yonghong, Li Jianing

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):14020-14037. doi: 10.1109/TPAMI.2023.3298925. Epub 2023 Oct 3.

Abstract

DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1 k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.

摘要

DAVIS相机能够传输异步事件和帧这两种互补的传感模式,已逐渐被用于应对主要的目标检测挑战(例如快速运动模糊和低光照)。然而,如何有效利用丰富的时间线索并融合两个异构视觉流仍然是一项具有挑战性的工作。为应对这一挑战,我们提出了一种新颖的基于Transformer的流式目标检测器,即SODFormer,它首先整合事件和帧,以异步方式持续检测目标。从技术上讲,我们首先通过1080.1k个手动标注构建了一个大规模多模态神经形态目标检测数据集(即PKU-DAVIS-SOD)。然后,我们设计了一种时空Transformer架构,通过端到端序列预测问题来检测目标,其中新颖的时间Transformer模块利用来自两个视觉流的丰富时间线索来提高检测性能。最后,提出了一种基于异步注意力的融合模块,以整合两个异构传感模式并从两端发挥互补优势,该模块可以随时被查询以定位目标,并突破基于同步帧的融合策略的有限输出频率。结果表明,所提出的SODFormer显著优于四种最先进的方法和我们的八个基线。我们还表明,即使在传统基于帧的相机失效的情况下,例如高速运动和低光照条件下,我们的统一框架也能很好地工作。我们的数据集和代码可在https://github.com/dianzl/SODFormer获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验