Mori Greg, Malik Jitendra
School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
IEEE Trans Pattern Anal Mach Intell. 2006 Jul;28(7):1052-62. doi: 10.1109/TPAMI.2006.149.
The problem we consider in this paper is to take a single two-dimensional image containing a human figure, locate the joint positions, and use these to estimate the body configuration and pose in three-dimensional space. The basic approach is to store a number of exemplar 2D views of the human body in a variety of different configurations and viewpoints with respect to the camera. On each of these stored views, the locations of the body joints (left elbow, right knee, etc.) are manually marked and labeled for future use. The input image is then matched to each stored view, using the technique of shape context matching in conjunction with a kinematic chain-based deformation model. Assuming that there is a stored view sufficiently similar in configuration and pose, the correspondence process will succeed. The locations of the body joints are then transferred from the exemplar view to the test shape. Given the 2D joint locations, the 3D body configuration and pose are then estimated using an existing algorithm. We can apply this technique to video by treating each frame independently--tracking just becomes repeated recognition. We present results on a variety of data sets.
我们在本文中所考虑的问题是,获取一张包含人体的二维图像,定位关节位置,并利用这些位置来估计三维空间中的身体结构和姿态。基本方法是存储人体在相对于相机的各种不同结构和视角下的多个二维示例视图。在这些存储的视图中的每一个上,手动标记并标注身体关节(左肘、右膝等)的位置以供后续使用。然后,使用形状上下文匹配技术结合基于运动链的变形模型,将输入图像与每个存储视图进行匹配。假设存在一个在结构和姿态上足够相似的存储视图,对应过程将会成功。然后将身体关节的位置从示例视图转移到测试形状上。给定二维关节位置后,使用现有算法估计三维身体结构和姿态。我们可以通过独立处理每一帧将此技术应用于视频——跟踪就变成了重复识别。我们展示了在各种数据集上的结果。