IEEE Trans Vis Comput Graph. 2020 May;26(5):1851-1859. doi: 10.1109/TVCG.2020.2973076. Epub 2020 Feb 13.
Point clouds-based 3D human pose estimation that aims to recover the 3D locations of human skeleton joints plays an important role in many AR/VR applications. The success of existing methods is generally built upon large scale data annotated with 3D human joints. However, it is a labor-intensive and error-prone process to annotate 3D human joints from input depth images or point clouds, due to the self-occlusion between body parts as well as the tedious annotation process on 3D point clouds. Meanwhile, it is easier to construct human pose datasets with 2D human joint annotations on depth images. To address this problem, we present a weakly supervised adversarial learning framework for 3D human pose estimation from point clouds. Compared to existing 3D human pose estimation methods from depth images or point clouds, we exploit both the weakly supervised data with only annotations of 2D human joints and fully supervised data with annotations of 3D human joints. In order to relieve the human pose ambiguity due to weak supervision, we adopt adversarial learning to ensure the recovered human pose is valid. Instead of using either 2D or 3D representations of depth images in previous methods, we exploit both point clouds and the input depth image. We adopt 2D CNN to extract 2D human joints from the input depth image, 2D human joints aid us in obtaining the initial 3D human joints and selecting effective sampling points that could reduce the computation cost of 3D human pose regression using point clouds network. The used point clouds network can narrow down the domain gap between the network input i.e. point clouds and 3D joints. Thanks to weakly supervised adversarial learning framework, our method can achieve accurate 3D human pose from point clouds. Experiments on the ITOP dataset and EVAL dataset demonstrate that our method can achieve state-of-the-art performance efficiently.
基于点云的 3D 人体姿态估计旨在恢复人体骨骼关节的 3D 位置,在许多 AR/VR 应用中发挥着重要作用。现有方法的成功通常建立在具有 3D 人体关节注释的大规模数据之上。然而,由于身体部位之间的自遮挡以及 3D 点云的繁琐注释过程,从输入深度图像或点云中注释 3D 人体关节是一项劳动密集型且容易出错的工作。同时,在深度图像上构建具有 2D 人体关节注释的人体姿态数据集要容易得多。为了解决这个问题,我们提出了一种基于点云的 3D 人体姿态估计的弱监督对抗学习框架。与现有的从深度图像或点云中进行 3D 人体姿态估计的方法相比,我们利用了仅具有 2D 人体关节注释的弱监督数据和具有 3D 人体关节注释的完全监督数据。为了缓解由于弱监督而导致的人体姿态模糊性,我们采用对抗学习来确保恢复的人体姿态是有效的。与之前的方法中仅使用深度图像的 2D 或 3D 表示不同,我们同时利用了点云和输入深度图像。我们采用 2D CNN 从输入深度图像中提取 2D 人体关节,2D 人体关节帮助我们获得初始的 3D 人体关节,并选择有效的采样点,这可以减少使用点云网络进行 3D 人体姿态回归的计算成本。所使用的点云网络可以缩小网络输入(即点云和 3D 关节)之间的域差距。得益于弱监督对抗学习框架,我们的方法可以从点云中实现准确的 3D 人体姿态。在 ITOP 数据集和 EVAL 数据集上的实验表明,我们的方法可以高效地实现最先进的性能。