Fu Yingkai, Li Meng, Liu Wenxi, Wang Yuanchen, Zhang Jiqing, Yin Baocai, Wei Xiaopeng, Yang Xin
IEEE Trans Image Process. 2023;32:6129-6141. doi: 10.1109/TIP.2023.3326683. Epub 2023 Nov 8.
Event cameras, or dynamic vision sensors, have recently achieved success from fundamental vision tasks to high-level vision researches. Due to its ability to asynchronously capture light intensity changes, event camera has an inherent advantage to capture moving objects in challenging scenarios including objects under low light, high dynamic range, or fast moving objects. Thus event camera are natural for visual object tracking. However, the current event-based trackers derived from RGB trackers simply modify the input images to event frames and still follow conventional tracking pipeline that mainly focus on object texture for target distinction. As a result, the trackers may not be robust dealing with challenging scenarios such as moving cameras and cluttered foreground. In this paper, we propose a distractor-aware event-based tracker that introduces transformer modules into Siamese network architecture (named DANet). Specifically, our model is mainly composed of a motion-aware network and a target-aware network, which simultaneously exploits both motion cues and object contours from event data, so as to discover motion objects and identify the target object by removing dynamic distractors. Our DANet can be trained in an end-to-end manner without any post-processing and can run at over 80 FPS on a single V100. We conduct comprehensive experiments on two large event tracking datasets to validate the proposed model. We demonstrate that our tracker has superior performance against the state-of-the-art trackers in terms of both accuracy and efficiency.
事件相机,即动态视觉传感器,近来在从基础视觉任务到高级视觉研究等方面都取得了成功。由于其能够异步捕捉光强变化,事件相机在捕捉具有挑战性场景中的移动物体方面具有固有优势,这些场景包括低光照、高动态范围下的物体或快速移动的物体。因此,事件相机对于视觉目标跟踪来说是很自然的选择。然而,当前基于事件的跟踪器是从RGB跟踪器衍生而来的,只是简单地将输入图像修改为事件帧,并且仍然遵循主要关注物体纹理以区分目标的传统跟踪流程。结果,这些跟踪器在处理诸如移动相机和杂乱前景等具有挑战性的场景时可能不够鲁棒。在本文中,我们提出了一种基于事件的分心物感知跟踪器,该跟踪器将Transformer模块引入到孪生网络架构中(命名为DANet)。具体而言,我们的模型主要由一个运动感知网络和一个目标感知网络组成,它们同时利用事件数据中的运动线索和物体轮廓,以便发现运动物体并通过去除动态分心物来识别目标物体。我们的DANet可以以端到端的方式进行训练,无需任何后处理,并且在单个V100上可以以超过80帧每秒 的速度运行。我们在两个大型事件跟踪数据集上进行了全面的实验,以验证所提出的模型。我们证明,我们的跟踪器在准确性和效率方面都优于当前最先进的跟踪器。