IEEE Trans Image Process. 2021;30:532-545. doi: 10.1109/TIP.2020.3037479. Epub 2020 Nov 24.
Recent emerging technologies such AR/VR and HCI are drawing high demand on more comprehensive hand shape understanding, requiring not only 3D hand skeleton pose but also hand shape geometry. In this paper, we propose a deep learning framework to produce 3D hand shape from a single depth image. To address the challenge that capturing ground truth 3D hand shape in the training dataset is non-trivial, we leverage synthetic data to construct a statistical hand shape model and adopt weak supervision from widely accessible hand skeleton pose annotation. To bridge the gap due to the different hand skeleton definitions in the existing public datasets, we propose a joint regression network for hand pose adaptation. To reconstruct the hand shape, we use Chamfer loss between the predicted hand shape and the point cloud from the input depth to learn the shape reconstruction model in a weakly-supervised manner. Experiments demonstrate that our model adapts well to the real data and produces accurate hand shapes that outperform the state-of-the-art methods both qualitatively and quantitatively.
最近出现的 AR/VR 和 HCI 等技术对更全面的手形理解提出了很高的要求,不仅需要 3D 手骨骼姿势,还需要手形几何形状。在本文中,我们提出了一个深度学习框架,从单个深度图像生成 3D 手形。为了解决在训练数据集中获取真实 3D 手形的挑战,我们利用合成数据构建统计手形模型,并采用广泛可用的手骨骼姿势注释进行弱监督。为了弥合现有公共数据集中文献中不同的手骨骼定义之间的差距,我们提出了一种用于手姿势适应的联合回归网络。为了重建手形,我们使用预测手形和输入深度的点云之间的 Chamfer 损失,以弱监督的方式学习形状重建模型。实验表明,我们的模型很好地适应了真实数据,并生成了准确的手形,在定性和定量方面都优于最先进的方法。