School of Computing, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan.
Gokhale Method Institute, Stanford, CA 94305, USA.
Sensors (Basel). 2021 Apr 1;21(7):2415. doi: 10.3390/s21072415.
We propose an efficient and novel architecture for 3D articulated human pose retrieval and reconstruction from 2D landmarks extracted from a 2D synthetic image, an annotated 2D image, an real RGB image or even a hand-drawn sketch. Given 2D joint positions in a single image, we devise a data-driven framework to infer the corresponding 3D human pose. To this end, we first normalize 3D human poses from Motion Capture (MoCap) dataset by eliminating translation, orientation, and the skeleton size discrepancies from the poses and then build a by projecting a subset of joints of the normalized 3D poses onto 2D image-planes by fully exploiting a variety of virtual cameras. With this approach, we not only transform 3D pose space to the normalized 2D pose space but also resolve the 2D-3D cross-domain retrieval task efficiently. The proposed architecture searches for poses from a MoCap dataset that are near to a given 2D query pose in a definite feature space made up of specific joint sets. These retrieved poses are then used to construct a weak perspective camera and a final 3D posture under the camera model that minimizes the reconstruction error. To estimate unknown camera parameters, we introduce a nonlinear, two-fold method. We exploit the retrieved similar poses and the viewing directions at which the MoCap dataset was sampled to minimize the projection error. Finally, we evaluate our approach thoroughly on a large number of heterogeneous 2D examples generated synthetically, 2D images with ground-truth, a variety of real internet images, and a proof of concept using 2D hand-drawn sketches of human poses. We conduct a pool of experiments to perform a quantitative study on PARSE dataset. We also show that the proposed system yields competitive, convincing results in comparison to other state-of-the-art methods.
我们提出了一种高效新颖的架构,用于从 2D 合成图像、标注 2D 图像、真实 RGB 图像甚至手绘草图中提取的 2D 地标中检索和重建 3D 关节人体姿势。给定单张图像中的 2D 关节位置,我们设计了一个数据驱动的框架来推断相应的 3D 人体姿势。为此,我们首先通过消除姿势中的平移、方向和骨架大小差异来对来自运动捕捉 (MoCap) 数据集的 3D 人体姿势进行归一化,然后通过充分利用各种虚拟相机将归一化 3D 姿势的子集关节投影到 2D 图像平面上来构建 。通过这种方法,我们不仅将 3D 姿势空间转换为归一化的 2D 姿势空间,而且还有效地解决了 2D-3D 跨域检索任务。所提出的架构在由特定关节集组成的确定特征空间中从 MoCap 数据集中搜索与给定 2D 查询姿势接近的姿势。然后,这些检索到的姿势用于在相机模型下构建弱透视相机和最终 3D 姿势,该相机模型最小化重建误差。为了估计未知的相机参数,我们引入了一种非线性的、两阶段的方法。我们利用检索到的相似姿势和 MoCap 数据集采样的视图方向来最小化投影误差。最后,我们在大量异构的 2D 示例上进行了全面评估,这些示例是通过合成生成的、具有地面真实值的 2D 图像、各种真实互联网图像以及使用 2D 手绘人体姿势草图的概念证明。我们进行了一系列实验,对 PARSE 数据集进行了定量研究。我们还表明,与其他最先进的方法相比,所提出的系统在比较中产生了具有竞争力的、令人信服的结果。