Malik Jameel, Elhayek Ahmed, Stricker Didier
German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany.
Department of Informatics, University of Kaiserslautern, 67653 Kaiserslautern, Germany.
Sensors (Basel). 2019 Aug 31;19(17):3784. doi: 10.3390/s19173784.
Hand shape and pose recovery is essential for many computer vision applications such as animation of a personalized hand mesh in a virtual environment. Although there are many hand pose estimation methods, only a few deep learning based algorithms target 3D hand shape and pose from a single RGB or depth image. Jointly estimating hand shape and pose is very challenging because none of the existing real benchmarks provides ground truth hand shape. For this reason, we propose a novel weakly-supervised approach for 3D hand shape and pose recovery (named WHSP-Net) from a single depth image by learning shapes from unlabeled real data and labeled synthetic data. To this end, we propose a novel framework which consists of three novel components. The first is the Convolutional Neural Network (CNN) based deep network which produces 3D joints positions from learned 3D bone vectors using a new layer. The second is a novel shape decoder that recovers dense 3D hand mesh from sparse joints. The third is a novel depth synthesizer which reconstructs 2D depth image from 3D hand mesh. The whole pipeline is fine-tuned in an end-to-end manner. We demonstrate that our approach recovers reasonable hand shapes from real world datasets as well as from live stream of depth camera in real-time. Our algorithm outperforms state-of-the-art methods that output more than the joint positions and shows competitive performance on 3D pose estimation task.
手部形状和姿态恢复对于许多计算机视觉应用至关重要,例如在虚拟环境中对个性化手部网格进行动画处理。尽管有许多手部姿态估计方法,但只有少数基于深度学习的算法能够从单张RGB或深度图像中确定3D手部形状和姿态。联合估计手部形状和姿态极具挑战性,因为现有的真实基准测试均未提供真实的手部形状。因此,我们提出了一种新颖的弱监督方法,用于从单张深度图像中恢复3D手部形状和姿态(名为WHSP-Net),该方法通过从未标记的真实数据和标记的合成数据中学习形状来实现。为此,我们提出了一个新颖的框架,它由三个新颖的组件组成。第一个是基于卷积神经网络(CNN)的深度网络,该网络使用一个新层从学习到的3D骨骼向量中生成3D关节位置。第二个是新颖的形状解码器,它从稀疏关节中恢复密集的3D手部网格。第三个是新颖的深度合成器,它从3D手部网格重建2D深度图像。整个管道以端到端的方式进行微调。我们证明,我们的方法能够从真实世界数据集以及深度相机的实时流中实时恢复合理的手部形状。我们的算法优于那些输出超过关节位置的现有方法,并在3D姿态估计任务中表现出具有竞争力的性能。