School of Information Science and Engineering, Ningbo University, Ningbo 315211, China.
Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315201, China.
Sensors (Basel). 2023 Mar 28;23(7):3547. doi: 10.3390/s23073547.
Three-dimensional (3D) pose estimation has been widely used in many three-dimensional human motion analysis applications, where inertia-based path estimation is gradually being adopted. Systems based on commercial inertial measurement units (IMUs) usually rely on dense and complex wearable sensors and time-consuming calibration, causing intrusions to the subject and hindering free body movement. The sparse IMUs-based method has drawn research attention recently. Existing sparse IMUs-based three-dimensional pose estimation methods use neural networks to obtain human poses from temporal feature information. However, these methods still suffer from issues, such as body shaking, body tilt, and movement ambiguity. This paper presents an approach to improve three-dimensional human pose estimation by fusing temporal and spatial features. Based on a multistage encoder-decoder network, a temporal convolutional encoder and human kinematics regression decoder were designed. The final three-dimensional pose was predicted from the temporal feature information and human kinematic feature information. Extensive experiments were conducted on two benchmark datasets for three-dimensional human pose estimation. Compared to state-of-the-art methods, the mean per joint position error was decreased by 13.6% and 19.4% on the total capture and DIP-IMU datasets, respectively. The quantitative comparison demonstrates that the proposed temporal information and human kinematic topology can improve pose accuracy.
三维 (3D) 姿态估计在许多三维人体运动分析应用中得到了广泛应用,其中基于惯性的路径估计正逐渐被采用。基于商用惯性测量单元 (IMU) 的系统通常依赖于密集和复杂的可穿戴传感器以及耗时的校准,这会对被试者造成干扰并限制其自由运动。最近,基于稀疏 IMU 的方法引起了研究关注。现有的基于稀疏 IMU 的三维姿态估计方法使用神经网络从时间特征信息中获取人体姿态。然而,这些方法仍然存在一些问题,例如身体抖动、身体倾斜和运动模糊。本文提出了一种通过融合时间和空间特征来改进三维人体姿态估计的方法。该方法基于多阶段编码器-解码器网络,设计了一个时间卷积编码器和人体运动学回归解码器。最终的三维姿态是从时间特征信息和人体运动学特征信息中预测得到的。在两个用于三维人体姿态估计的基准数据集上进行了广泛的实验。与最先进的方法相比,在总捕获数据集和 DIP-IMU 数据集上,每个关节位置的平均误差分别降低了 13.6%和 19.4%。定量比较表明,所提出的时间信息和人体运动拓扑结构可以提高姿态准确性。