Zhou Xiaowei, Zhu Menglong, Pavlakos Georgios, Leonardos Spyridon, Derpanis Konstantinos G, Daniilidis Kostas
IEEE Trans Pattern Anal Mach Intell. 2019 Apr;41(4):901-914. doi: 10.1109/TPAMI.2018.2816031. Epub 2018 Mar 15.
Recovering 3D full-body human pose is a challenging problem with many applications. It has been successfully addressed by motion capture systems with body worn markers and multiple cameras. In this paper, we address the more challenging case of not only using a single camera but also not leveraging markers: going directly from 2D appearance to 3D geometry. Deep learning approaches have shown remarkable abilities to discriminatively learn 2D appearance features. The missing piece is how to integrate 2D, 3D, and temporal information to recover 3D geometry and account for the uncertainties arising from the discriminative model. We introduce a novel approach that treats 2D joint locations as latent variables whose uncertainty distributions are given by a deep fully convolutional neural network. The unknown 3D poses are modeled by a sparse representation and the 3D parameter estimates are realized via an Expectation-Maximization algorithm, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out during inference. Extensive evaluation on benchmark datasets shows that the proposed approach achieves greater accuracy over state-of-the-art baselines. Notably, the proposed approach does not require synchronized 2D-3D data for training and is applicable to "in-the-wild" images, which is demonstrated with the MPII dataset.
恢复3D全身人体姿态是一个具有许多应用场景的挑战性问题。带有身体佩戴标记的运动捕捉系统和多台相机已成功解决了该问题。在本文中,我们解决了一个更具挑战性的情况,即不仅使用单个相机,而且不利用标记:直接从2D外观恢复到3D几何形状。深度学习方法已显示出卓越的能力来有区别地学习2D外观特征。缺失的部分是如何整合2D、3D和时间信息以恢复3D几何形状,并处理判别模型产生的不确定性。我们引入了一种新颖的方法,将2D关节位置视为潜在变量,其不确定性分布由深度全卷积神经网络给出。未知的3D姿态通过稀疏表示进行建模,3D参数估计通过期望最大化算法实现,其中表明在推理过程中可以方便地将2D关节位置不确定性边缘化。在基准数据集上的广泛评估表明,所提出的方法比现有最先进的基线方法具有更高的准确性。值得注意的是,所提出的方法在训练时不需要同步的2D-3D数据,并且适用于“自然场景”图像,这在MPII数据集上得到了验证。