Han Shoudong, Wang Hongwei, Yu En, Hu Zhuo
National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China.
School of Automation, Wuhan University of Technology, Wuhan 430070, China.
Fundam Res. 2023 Feb 24;5(3):1214-1220. doi: 10.1016/j.fmre.2023.02.003. eCollection 2025 May.
Although the joint-detection-and-tracking paradigm has promoted the development of multi-object tracking (MOT) significantly, the long-term occlusion problem is still unsolved. After a period of trajectory inactivation due to occlusion, it is difficult to achieve trajectory reconnection with appearance features because they are no longer reliable. Although using motion cues does not suffer from occlusion, the commonly used Kalman Filter is also ineffective in its long-term inertia prediction in cases of no observation updates or wrong updates. Besides, occlusion is prone to cause multiple track-detection pairs to have close similarity scores during the data association phase. The direct use of the Hungarian algorithm to give the global optimal solution may generate the identity switching problem. In this paper, we propose the Long-term Spatio-Temporal Prediction (LSTP) module and the Ordered Association (OA) module to alleviate the occlusion problem in terms of motion prediction and data association, respectively. The LSTP module estimates the states of all tracked objects over time using a combination of spatial and temporal Transformers. The spatial Transformer models crowd interaction and learns the influence of neighbors, while the temporal Transformer models the temporal continuity of historical trajectories. Besides, the LSTP module also predicts the visibilities of the motion prediction boxes, which denote the occlusion attributes of trajectories. Based on the occlusion attribute and active state, the association priority is defined in the OA module to associate trajectories in order, which helps to alleviate the identity switching problem. Comprehensive experiments on the MOT17 and MOT20 benchmarks indicate the superiority of the proposed MOT framework, namely Occlusion-Robust Tracker (ORT). Without using any appearance information, our ORT can achieve competitive performance beyond other state-of-the-art trackers in terms of trajectory accuracy and purity.
尽管联合检测与跟踪范式显著推动了多目标跟踪(MOT)的发展,但长期遮挡问题仍未得到解决。在由于遮挡导致轨迹失活一段时间后,很难利用外观特征实现轨迹重新连接,因为这些特征不再可靠。尽管使用运动线索不受遮挡影响,但常用的卡尔曼滤波器在没有观测更新或更新错误的情况下,其长期惯性预测也无效。此外,在数据关联阶段,遮挡容易导致多个轨迹检测对具有相近的相似度分数。直接使用匈牙利算法给出全局最优解可能会产生身份切换问题。在本文中,我们分别提出了长期时空预测(LSTP)模块和有序关联(OA)模块,以在运动预测和数据关联方面缓解遮挡问题。LSTP模块使用空间和时间变换器的组合来估计所有被跟踪对象随时间的状态。空间变换器对人群交互进行建模并学习邻居的影响,而时间变换器对历史轨迹的时间连续性进行建模。此外,LSTP模块还预测运动预测框的可见性,其表示轨迹的遮挡属性。基于遮挡属性和活跃状态,在OA模块中定义关联优先级以按顺序关联轨迹,这有助于缓解身份切换问题。在MOT17和MOT20基准上的综合实验表明了所提出的MOT框架,即遮挡鲁棒跟踪器(ORT)的优越性。在不使用任何外观信息的情况下,我们的ORT在轨迹准确性和纯度方面可以超越其他最先进的跟踪器,实现具有竞争力的性能。