IEEE Trans Image Process. 2022;31:1938-1948. doi: 10.1109/TIP.2022.3149229. Epub 2022 Feb 16.
A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at https://igame-lab.github.io/LASOR/.
人体姿态和形状估计任务中的一个关键挑战是遮挡,包括自遮挡、物体-人体遮挡和人与人之间的遮挡。缺乏多样化和准确的姿态和形状训练数据成为一个主要的瓶颈,特别是对于野外有遮挡的场景。在本文中,我们专注于处理人与人之间遮挡的情况下的人体姿态和形状估计,同时也处理物体-人体遮挡和自遮挡。我们提出了一个新颖的框架,该框架综合了遮挡感知的轮廓和 2D 关键点数据,并直接回归到 SMPL 姿态和形状参数。利用神经 3D 网格渲染器来实现轮廓的实时监督,这有助于极大地提高形状估计的效果。此外,我们还合成了全景视角下的关键点和轮廓驱动的训练数据,以弥补任何现有数据集在视角多样性方面的不足。实验结果表明,在 3DPW 和 3DPW-Crowd 数据集上,我们在姿态估计准确性方面处于领先地位。与 Mesh Transformer、3DCrowdNet 和 ROMP 相比,我们的方法在形状估计方面表现明显更好。在 SSP-3D 数据集上,我们在形状预测准确性方面也取得了顶尖的性能。演示和代码将在 https://igame-lab.github.io/LASOR/ 上提供。