Ren Weihong, Wang Xinchao, Tian Jiandong, Tang Yandong, Chan Antoni B
IEEE Trans Image Process. 2021;30:1439-1452. doi: 10.1109/TIP.2020.3044219. Epub 2020 Dec 29.
State-of-the-art multi-object tracking (MOT) methods follow the tracking-by-detection paradigm, where object trajectories are obtained by associating per-frame outputs of object detectors. In crowded scenes, however, detectors often fail to obtain accurate detections due to heavy occlusions and high crowd density. In this paper, we propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes. Using crowd density maps, we jointly model detection, counting, and tracking of multiple targets as a network flow program, which simultaneously finds the global optimal detections and trajectories of multiple targets over the whole video. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to errors in crowded scenes, or rely on a suboptimal two-step process using heuristic density-aware point-tracks for matching targets. Our approach yields promising results on public benchmarks of various domains including people tracking, cell tracking, and fish tracking.
先进的多目标跟踪(MOT)方法遵循检测跟踪范式,即通过关联目标检测器的逐帧输出获得目标轨迹。然而,在拥挤场景中,由于严重遮挡和高人群密度,检测器常常无法获得准确的检测结果。在本文中,我们提出了一种专门针对拥挤场景的新MOT范式——计数跟踪。利用人群密度图,我们将多个目标的检测、计数和跟踪联合建模为一个网络流程序,该程序能在整个视频中同时找到多个目标的全局最优检测结果和轨迹。这与之前的MOT方法形成对比,之前的方法要么忽略人群密度,因此在拥挤场景中容易出错,要么依赖使用启发式密度感知点轨迹进行目标匹配的次优两步过程。我们的方法在包括人员跟踪、细胞跟踪和鱼类跟踪等各个领域的公共基准测试中取得了有希望的结果。