Suppr超能文献

视觉事件:通过帧流与事件流协作实现可靠目标跟踪

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows.

作者信息

Wang Xiao, Li Jianing, Zhu Lin, Zhang Zhipeng, Chen Zhe, Li Xin, Wang Yaowei, Tian Yonghong, Wu Feng

出版信息

IEEE Trans Cybern. 2024 Mar;54(3):1997-2010. doi: 10.1109/TCYB.2023.3318601. Epub 2024 Feb 9.

Abstract

Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency. In practice, visible cameras can better perceive texture details and slow motion, while event cameras can be free from motion blurs and have a larger dynamic range which enables them to work well under fast motion and low illumination (LI). Therefore, the two sensors can cooperate with each other to achieve more reliable object tracking. In this work, we propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task. Our dataset consists of 820 video pairs captured under LI, high speed, and background clutter scenarios, and it is divided into a training and a testing subset, each of which contains 500 and 320 videos, respectively. Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods by extending current single-modality trackers into dual-modality versions. More importantly, we further build a simple but effective tracking algorithm by proposing a cross-modality transformer, to achieve more effective feature fusion between visible and event data. Extensive experiments on the proposed VisEvent dataset, FE108, COESOT, and two simulated datasets (i.e., OTB-DVS and VOT-DVS), validated the effectiveness of our model. The dataset and source code have been released on: https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark.

摘要

与逐帧记录强度图像的可见光相机不同,受生物启发的事件相机产生异步且稀疏的事件流,延迟要低得多。在实际应用中,可见光相机能更好地感知纹理细节和慢动作,而事件相机则不会出现运动模糊,并且具有更大的动态范围,这使得它们在快速运动和低光照(LI)条件下也能良好工作。因此,这两种传感器可以相互协作以实现更可靠的目标跟踪。在这项工作中,由于缺乏针对此任务的真实且有规模的数据集,我们提出了一个大规模的可见光-事件基准(称为VisEvent)。我们的数据集由在低光照、高速和背景杂乱场景下捕获的820对视频组成,它被分为训练子集和测试子集,每个子集分别包含500个和320个视频。基于VisEvent,我们将事件流转换为事件图像,并通过将当前的单模态跟踪器扩展为双模态版本来构建30多种基线方法。更重要的是,我们通过提出一种跨模态变换器进一步构建了一种简单而有效的跟踪算法,以实现可见光和事件数据之间更有效的特征融合。在提出的VisEvent数据集、FE108、COESOT以及两个模拟数据集(即OTB-DVS和VOT-DVS)上进行的大量实验验证了我们模型的有效性。数据集和源代码已发布在:https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验