Dill Sebastian, Ahmadi Arjang, Grimmer Martin, Haufe Dennis, Rohr Maurice, Zhao Yanhua, Sharbafi Maziar, Hoog Antink Christoph
KIS*MED (AI Systems in Medicine), Technische Universität Darmstadt, 64283 Darmstadt, Germany.
Lauflabor (Locomotion Laboratory), Centre for Cognitive Science, Technische Universität Darmstadt, 64289 Darmstadt, Germany.
Sensors (Basel). 2024 Dec 4;24(23):7772. doi: 10.3390/s24237772.
In recent years, significant research has been conducted on video-based human pose estimation (HPE). While monocular two-dimensional (2D) HPE has been shown to achieve high performance, monocular three-dimensional (3D) HPE poses a more challenging problem. However, since human motion happens in a 3D space, 3D HPE offers a more accurate representation of the human, granting increased usability for complex tasks like analysis of physical exercise. We propose a method based on MediaPipe Pose, 2D HPE on stereo cameras and a fusion algorithm without prior stereo calibration to reconstruct 3D poses, combining the advantages of high accuracy in 2D HPE with the increased usability of 3D coordinates. We evaluate this method on a self-recorded database focused on physical exercise to research what accuracy can be achieved and whether this accuracy is sufficient to recognize errors in exercise performance. We find that our method achieves significantly improved performance compared to monocular 3D HPE (median RMSE of 30.1 compared to 56.3, -value below 10-6) and can show that the performance is sufficient for error recognition.
近年来,针对基于视频的人体姿态估计(HPE)开展了大量研究。虽然单目二维(2D)HPE已被证明能实现高性能,但单目三维(3D)HPE则是一个更具挑战性的问题。然而,由于人体运动发生在三维空间中,3D HPE能更准确地呈现人体,为诸如体育锻炼分析等复杂任务提供了更高的可用性。我们提出了一种基于MediaPipe Pose、立体摄像机上的2D HPE以及一种无需预先进行立体校准的融合算法来重建3D姿态的方法,该方法结合了2D HPE的高精度优势和3D坐标更高的可用性。我们在一个专注于体育锻炼的自记录数据库上评估此方法,以研究能达到何种精度以及该精度是否足以识别锻炼表现中的错误。我们发现,与单目3D HPE相比,我们的方法实现了显著的性能提升(中位数均方根误差为30.1,而单目3D HPE为56.3,p值低于10 - 6),并且能够证明该性能足以进行错误识别。