Vicarious Perception Technologies (VicarVision), 1015 AH Amsterdam, The Netherlands.
Computer Vision Lab, Delft University of Technology, 2628 XE Delft, The Netherlands.
Sensors (Basel). 2022 Dec 28;23(1):341. doi: 10.3390/s23010341.
Markerless estimation of 3D Kinematics has the great potential to clinically diagnose and monitor movement disorders without referrals to expensive motion capture labs; however, current approaches are limited by performing multiple de-coupled steps to estimate the kinematics of a person from videos. Most current techniques work in a multi-step approach by first detecting the pose of the body and then fitting a musculoskeletal model to the data for accurate kinematic estimation. Errors in training data of the pose detection algorithms, model scaling, as well the requirement of multiple cameras limit the use of these techniques in a clinical setting. Our goal is to pave the way toward fast, easily applicable and accurate 3D kinematic estimation. To this end, we propose a novel approach for direct 3D human kinematic estimation D3KE from videos using deep neural networks. Our experiments demonstrate that the proposed end-to-end training is robust and outperforms 2D and 3D markerless motion capture based kinematic estimation pipelines in terms of joint angles error by a large margin (35% from 5.44 to 3.54 degrees). We show that D3KE is superior to the multi-step approach and can run at video framerate speeds. This technology shows the potential for clinical analysis from mobile devices in the future.
无标记 3D 运动学估计具有很大的潜力,可以在无需转诊到昂贵的运动捕捉实验室的情况下临床诊断和监测运动障碍;然而,目前的方法受到限制,需要执行多个解耦步骤才能从视频中估计人的运动学。目前大多数技术采用多步骤方法,首先检测身体姿势,然后将肌肉骨骼模型拟合到数据中以进行准确的运动学估计。姿势检测算法的训练数据错误、模型缩放以及需要多个摄像头限制了这些技术在临床环境中的应用。我们的目标是为快速、易于应用和准确的 3D 运动学估计铺平道路。为此,我们提出了一种新颖的方法,即使用深度神经网络直接从视频中进行 3D 人体运动学估计(D3KE)。我们的实验表明,所提出的端到端训练是稳健的,并在关节角度误差方面大大优于 2D 和 3D 无标记运动捕捉的运动学估计管道(从 5.44 度到 3.54 度降低了 35%)。我们表明 D3KE 优于多步骤方法,可以以视频帧率速度运行。这项技术展示了未来从移动设备进行临床分析的潜力。