IEEE Trans Pattern Anal Mach Intell. 2019 May;41(5):1227-1241. doi: 10.1109/TPAMI.2018.2828427. Epub 2018 Apr 19.
We propose a method for estimating 3D human poses from single images or video sequences. The task is challenging because: (a) many 3D poses can have similar 2D pose projections which makes the lifting ambiguous, and (b) current 2D joint detectors are not accurate which can cause big errors in 3D estimates. We represent 3D poses by a sparse combination of bases which encode structural pose priors to reduce the lifting ambiguity. This prior is strengthened by adding limb length constraints. We estimate the 3D pose by minimizing an L norm measurement error between the 2D pose and the 3D pose because it is less sensitive to inaccurate 2D poses. We modify our algorithm to output K 3D pose candidates for an image, and for videos, we impose a temporal smoothness constraint to select the best sequence of 3D poses from the candidates. We demonstrate good results on 3D pose estimation from static images and improved performance by selecting the best 3D pose from the K proposals. Our results on video sequences also show improvements (over static images) of roughly 15%.
我们提出了一种从单张图像或视频序列中估计 3D 人体姿势的方法。这个任务具有挑战性,原因在于:(a) 许多 3D 姿势可能具有相似的 2D 姿势投影,这使得提升过程变得模糊;(b) 目前的 2D 关节探测器不够精确,这可能会导致 3D 估计的误差很大。我们通过稀疏组合基来表示 3D 姿势,这些基编码了结构姿势先验,以减少提升的模糊性。通过添加肢体长度约束,进一步增强了这个先验。我们通过最小化 2D 姿势和 3D 姿势之间的 L 范数测量误差来估计 3D 姿势,因为它对不准确的 2D 姿势不太敏感。我们修改了我们的算法,为图像输出 K 个 3D 姿势候选,对于视频,我们施加一个时间平滑约束,从候选中选择最佳的 3D 姿势序列。我们在静态图像的 3D 姿势估计方面取得了良好的效果,并通过从 K 个提案中选择最佳的 3D 姿势来提高性能。我们在视频序列上的结果也显示出(相对于静态图像)约 15%的改进。