Li Ye, Zhan Li, Wu Lei, Liu Hongkun, Min Jianli, Wang Xinzhong
Shenzhen Institute of Information Technology, Shenzhen, China.
University of Electronic Science and Technology of China, Chengdu, China.
Sci Rep. 2025 Jul 2;15(1):22533. doi: 10.1038/s41598-025-07276-z.
The main task of pedestrian multi-object tracking is to continuously detect and locate target pedestrians in a video sequence and complete inter-frame target association, which has broad application prospects in various fields such as intelligent surveillance and video analysis. Currently, most methods mainly use the Kalman filter to model the motion and predict the position of the pedestrian in the next frame. Then, based on the Intersection over Union (IoU) between the predicted bounding box and the detection bounding box, the Hungarian algorithm is used to match the targets between frames. However, data association methods that rely solely on spatial information, such as IoU, cannot ensure the consistency of pedestrian identity when severe occlusion occurs, nor can they guarantee the stability of identity when different pedestrians are in close proximity. Pedestrian appearance information can effectively address the above problems. Even when the pedestrian's position is lost for a certain number of frames due to occlusion, the appearance information of the pedestrian remains consistent before and after occlusion. Therefore, we design a multi-object pedestrian tracking method that combines re-identification feature assistance and multi-stage data association (RAMA). This method innovatively focuses on the role of low confidence bounding boxes in MOT, and introduces a separately trained pedestrian re-identification model to extract discriminative features of pedestrians, then adds this feature to the multi-stage data association algorithm to improve the accuracy of multi-object tracking. The RAMA exhibits stronger identity association ability on the MOT16 and MOT17 test sets, achieving an IDF1 of 75.0% and 74.5%, respectively.
行人多目标跟踪的主要任务是在视频序列中持续检测和定位目标行人,并完成帧间目标关联,这在智能监控和视频分析等各个领域都有广阔的应用前景。目前,大多数方法主要使用卡尔曼滤波器对运动进行建模并预测下一帧中行人的位置。然后,基于预测边界框与检测边界框之间的交并比(IoU),使用匈牙利算法进行帧间目标匹配。然而,仅依赖空间信息的数据关联方法,如IoU,在严重遮挡发生时无法确保行人身份的一致性,在不同行人靠近时也不能保证身份的稳定性。行人外观信息可以有效解决上述问题。即使行人由于遮挡在一定帧数内位置丢失,其遮挡前后的外观信息仍然保持一致。因此,我们设计了一种结合再识别特征辅助和多阶段数据关联的多目标行人跟踪方法(RAMA)。该方法创新性地关注低置信度边界框在多目标跟踪中的作用,引入一个单独训练的行人再识别模型来提取行人的判别特征,然后将此特征添加到多阶段数据关联算法中,以提高多目标跟踪的准确性。RAMA在MOT16和MOT17测试集上表现出更强的身份关联能力,分别实现了75.0%和74.5%的IDF1。