Ramanan Deva, Forsyth David A, Zisserman Andrew
IEEE Trans Pattern Anal Mach Intell. 2007 Jan;29(1):65-81. doi: 10.1109/tpami.2007.250600.
An open vision problem is to automatically track the articulations of people from a video sequence. This problem is difficult because one needs to determine both the number of people in each frame and estimate their configurations. But, finding people and localizing their limbs is hard because people can move fast and unpredictably, can appear in a variety of poses and clothes, and are often surrounded by limb-like clutter. We develop a completely automatic system that works in two stages; it first builds a model of appearance of each person in a video and then it tracks by detecting those models in each frame ("tracking by model-building and detection"). We develop two algorithms that build models; one bottom-up approach groups together candidate body parts found throughout a sequence. We also describe a top-down approach that automatically builds people-models by detecting convenient key poses within a sequence. We finally show that building a discriminative model of appearance is quite helpful since it exploits structure in a background (without background-subtraction). We demonstrate the resulting tracker on hundreds of thousands of frames of unscripted indoor and outdoor activity, a feature-length film ("Run Lola Run"), and legacy sports footage (from the 2002 World Series and 1998 Winter Olympics). Experiments suggest that our system 1) can count distinct individuals, 2) can identify and track them, 3) can recover when it loses track, for example, if individuals are occluded or briefly leave the view, 4) can identify body configuration accurately, and 5) is not dependent on particular models of human motion.
一个开放的视觉问题是从视频序列中自动跟踪人体关节。这个问题很困难,因为需要确定每一帧中的人数并估计他们的姿态。但是,找到人并定位他们的肢体很困难,因为人可以快速且不可预测地移动,可以以各种姿势和穿着出现,并且经常被类似肢体的杂物包围。我们开发了一个完全自动的系统,该系统分两个阶段工作;它首先构建视频中每个人的外观模型,然后通过在每一帧中检测这些模型来进行跟踪(“通过模型构建和检测进行跟踪”)。我们开发了两种构建模型的算法;一种自下而上的方法将在整个序列中找到的候选身体部位组合在一起。我们还描述了一种自上而下的方法,该方法通过检测序列中的方便关键姿势来自动构建人体模型。我们最终表明,构建外观的判别模型非常有帮助,因为它利用了背景中的结构(无需背景减除)。我们在数十万帧无脚本的室内和室外活动、一部故事片(《罗拉快跑》)以及传统体育镜头(来自2002年世界系列赛和1998年冬奥会)上展示了由此产生的跟踪器。实验表明,我们的系统1)可以对不同个体进行计数,2)可以识别并跟踪他们,3)在失去跟踪时能够恢复,例如,如果个体被遮挡或短暂离开视野,4)可以准确识别身体姿态,5)不依赖于特定的人体运动模型。