Hu Weiming, Wang Shaoru, Zhou Zongwei, Gao Jin, Li Yangxi, Maybank Stephen
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11446-11463. doi: 10.1109/TPAMI.2024.3457886. Epub 2024 Nov 6.
The tracking-by-detection paradigm currently dominates multiple target tracking algorithms. It usually includes three tasks: target detection, appearance feature embedding, and data association. Carrying out these three tasks successively usually leads to lower tracking efficiency. In this paper, we propose a one-stage anchor-free multiple task learning framework which carries out target detection and appearance feature embedding in parallel to substantially increase the tracking speed. This framework simultaneously predicts a target detection and produces a feature embedding for each location, by sharing a pyramid of feature maps. We propose a deformable local attention module which utilizes the correlations between features at different locations within a target to obtain more discriminative features. We further propose a task-aware prediction module which utilizes deformable convolutions to select the most suitable locations for the different tasks. At the selected locations, classification of samples into foreground or background, appearance feature embedding, and target box regression are carried out. Two effective training strategies, regression range overlapping and sample reweighting, are proposed to reduce missed detections in dense scenes. Ambiguous samples whose identities are difficult to determine are effectively dealt with to obtain more accurate feature embedding of target appearance. An appearance-enhanced non-maximum suppression is proposed to reduce over-suppression of true targets in crowded scenes. Based on the one-stage anchor-free network with the deformable local attention module and the task-aware prediction module, we implement a new online multiple target tracker. Experimental results show that our tracker achieves a very fast speed while maintaining a high tracking accuracy.
基于检测的跟踪范式目前在多目标跟踪算法中占据主导地位。它通常包括三个任务:目标检测、外观特征嵌入和数据关联。依次执行这三个任务通常会导致跟踪效率较低。在本文中,我们提出了一种单阶段无锚多任务学习框架,该框架并行执行目标检测和外观特征嵌入,以大幅提高跟踪速度。该框架通过共享特征图金字塔,同时预测目标检测并为每个位置生成特征嵌入。我们提出了一种可变形局部注意力模块,该模块利用目标内不同位置特征之间的相关性来获得更具判别力的特征。我们进一步提出了一种任务感知预测模块,该模块利用可变形卷积为不同任务选择最合适的位置。在选定的位置进行样本的前景或背景分类、外观特征嵌入和目标框回归。提出了两种有效的训练策略,即回归范围重叠和样本重新加权,以减少密集场景中的漏检。有效处理身份难以确定的模糊样本,以获得更准确的目标外观特征嵌入。提出了一种外观增强的非极大值抑制方法,以减少拥挤场景中对真实目标的过度抑制。基于带有可变形局部注意力模块和任务感知预测模块的单阶段无锚网络,我们实现了一种新的在线多目标跟踪器。实验结果表明,我们的跟踪器在保持高跟踪精度的同时实现了非常快的速度。