Faculty of Engineering Technology, Hung Vuong University, Viet Tri City 35100, Vietnam.
Department of Intelligent Computer Systems, Czestochowa University of Technology, 42-218 Czestochowa, Poland.
Sensors (Basel). 2022 Jul 20;22(14):5419. doi: 10.3390/s22145419.
Three-dimensional human pose estimation is widely applied in sports, robotics, and healthcare. In the past five years, the number of CNN-based studies for 3D human pose estimation has been numerous and has yielded impressive results. However, studies often focus only on improving the accuracy of the estimation results. In this paper, we propose a fast, unified end-to-end model for estimating 3D human pose, called YOLOv5-HR-TCM (YOLOv5-HRet-Temporal Convolution Model). Our proposed model is based on the 2D to 3D lifting approach for 3D human pose estimation while taking care of each step in the estimation process, such as person detection, 2D human pose estimation, and 3D human pose estimation. The proposed model is a combination of best practices at each stage. Our proposed model is evaluated on the Human 3.6M dataset and compared with other methods at each step. The method achieves high accuracy, not sacrificing processing speed. The estimated time of the whole process is 3.146 FPS on a low-end computer. In particular, we propose a sports scoring application based on the deviation angle between the estimated 3D human posture and the standard (reference) origin. The average deviation angle evaluated on the Human 3.6M dataset (Protocol #1-Pro #1) is 8.2 degrees.
三维人体姿态估计广泛应用于体育、机器人和医疗保健领域。在过去的五年中,基于 CNN 的三维人体姿态估计研究数量众多,取得了令人瞩目的成果。然而,这些研究通常只关注提高估计结果的准确性。在本文中,我们提出了一种快速、统一的端到端模型,用于估计三维人体姿态,称为 YOLOv5-HR-TCM(YOLOv5-HRet-时间卷积模型)。我们的模型基于二维到三维提升方法进行三维人体姿态估计,同时考虑到估计过程中的每个步骤,如人体检测、二维人体姿态估计和三维人体姿态估计。该模型是每个阶段最佳实践的组合。我们的模型在 Human 3.6M 数据集上进行评估,并在每个步骤与其他方法进行比较。该方法在不牺牲处理速度的情况下实现了高精度。在低端计算机上,整个过程的估计时间为 3.146 FPS。特别是,我们提出了一种基于估计的三维人体姿态与标准(参考)原点之间偏差角度的运动评分应用。在 Human 3.6M 数据集上评估的平均偏差角度(协议 #1-Pro #1)为 8.2 度。