IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5225-5242. doi: 10.1109/TPAMI.2021.3070562. Epub 2022 Aug 4.
Crowded scene surveillance can significantly benefit from combining egocentric-view and its complementary top-view cameras. A typical setting is an egocentric-view camera, e.g., a wearable camera on the ground capturing rich local details, and a top-view camera, e.g., a drone-mounted one from high altitude providing a global picture of the scene. To collaboratively analyze such complementary-view videos, an important task is to associate and track multiple people across views and over time, which is challenging and differs from classical human tracking, since we need to not only track multiple subjects in each video, but also identify the same subjects across the two complementary views. This paper formulates it as a constrained mixed integer programming problem, wherein a major challenge is how to effectively measure subjects similarity over time in each video and across two views. Although appearance and motion consistencies well apply to over-time association, they are not good at connecting two highly different complementary views. To this end, we present a spatial distribution based approach to reliable cross-view subject association. We also build a dataset to benchmark this new challenging task. Extensive experiments verify the effectiveness of our method.
拥挤场景监控可以通过结合自拍摄像机和其互补的顶视图摄像机显著受益。一个典型的设置是一个自拍摄像机,例如,地面上的可穿戴摄像机,可以捕获丰富的本地细节,以及一个顶视图摄像机,例如,从高空的无人机安装的摄像机,可以提供场景的全局画面。为了协作分析这种互补视图的视频,一个重要的任务是跨视图和随时间关联和跟踪多个对象,这是具有挑战性的,与经典的人类跟踪不同,因为我们不仅需要在每个视频中跟踪多个对象,而且需要在两个互补视图中识别相同的对象。本文将其表述为一个受约束的混合整数规划问题,其中一个主要挑战是如何有效地在每个视频中以及在两个视图之间随时间测量对象的相似性。虽然外观和运动一致性非常适用于随时间的关联,但它们不擅长连接两个非常不同的互补视图。为此,我们提出了一种基于空间分布的可靠跨视图对象关联方法。我们还构建了一个数据集来基准这个新的具有挑战性的任务。广泛的实验验证了我们方法的有效性。