Agarwal Ankur, Triggs Bill
INRIA Rhône-Alpes, 665, Avenue de l'Europe, 38330 Montbonnot, France.
IEEE Trans Pattern Anal Mach Intell. 2006 Jan;28(1):44-58. doi: 10.1109/TPAMI.2006.21.
We describe a learning-based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labeling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression, and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. The loss of depth and limb labeling information often makes the recovery of 3D pose from single silhouettes ambiguous. To handle this, the method is embedded in a novel regressive tracking framework, using dynamics from the previous state estimate together with a learned regression value to disambiguate the pose. We show that the resulting system tracks long sequences stably. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated for several representations of full body pose, both quantitatively on independent but similar test data and qualitatively on real image sequences. Mean angular errors of 4-6 degrees are obtained for a variety of walking motions.
我们描述了一种基于学习的方法,用于从单张图像和单目图像序列中恢复三维人体姿态。我们的方法既不需要明确的人体模型,也不需要对图像中的身体部位进行预先标注。相反,它通过对从图像轮廓中自动提取的形状描述符向量进行直接非线性回归来恢复姿态。为了增强对局部轮廓分割错误的鲁棒性,轮廓形状由形状上下文直方图描述符进行编码。我们评估了几种不同的回归方法:岭回归、相关向量机(RVM)回归以及基于线性和核基的支持向量机(SVM)回归。RVM在不影响性能的情况下提供了更为稀疏的回归器,并且核基在性能上带来了虽小但值得的提升。深度和肢体标注信息的缺失常常使得从单个轮廓中恢复三维姿态变得模糊不清。为了解决这个问题,该方法被嵌入到一个新颖的回归跟踪框架中,利用先前状态估计的动态信息以及学习到的回归值来消除姿态的歧义。我们展示了所得到的系统能够稳定地跟踪长序列。为了在广泛的视角范围内实现真实感和良好的泛化能力,我们在从真实人体运动捕捉数据重新合成的图像上训练回归器。该方法针对全身姿态的几种表示方式进行了演示,在独立但相似的测试数据上进行了定量评估,在真实图像序列上进行了定性评估。对于各种行走动作,平均角度误差为4 - 6度。