Korea Electronics Technology Institute, Seongnam-si 13509, Gyeonggi-do, Korea.
Sensors (Basel). 2022 Jun 27;22(13):4846. doi: 10.3390/s22134846.
The motion capture method using sparse inertial sensors is an approach for solving the occlusion and economic problems in vision-based methods, which is suitable for virtual reality applications and works in complex environments. However, VR applications need to track the location of the user in real-world space, which is hard to obtain using only inertial sensors. In this paper, we present Fusion Poser, which combines the deep learning-based pose estimation and location tracking method with six inertial measurement units and a head tracking sensor that provides head-mounted displays. To estimate human poses, we propose a bidirectional recurrent neural network with a convolutional long short-term memory layer that achieves higher accuracy and stability by preserving spatio-temporal properties. To locate a user with real-world coordinates, our method integrates the results of an estimated joint pose with the pose of the tracker. To train the model, we gathered public motion capture datasets of synthesized IMU measurement data, as well as creating a real-world dataset. In the evaluation, our method showed higher accuracy and a more robust estimation performance, especially when the user adopted lower poses, such as a squat or a bow.
基于稀疏惯性传感器的运动捕捉方法是一种解决基于视觉方法中的遮挡和经济问题的方法,适用于虚拟现实应用,并在复杂环境中工作。然而,虚拟现实应用需要跟踪用户在现实世界空间中的位置,而仅使用惯性传感器很难获得。在本文中,我们提出了 Fusion Poser,它将基于深度学习的姿势估计和位置跟踪方法与六个惯性测量单元和一个头部跟踪传感器结合在一起,该传感器提供头戴式显示器。为了估计人体姿势,我们提出了一种具有卷积长短期记忆层的双向递归神经网络,通过保留时空特性来实现更高的准确性和稳定性。为了用真实世界的坐标定位用户,我们的方法将估计的关节姿势的结果与跟踪器的姿势相结合。为了训练模型,我们收集了公共运动捕捉数据集的合成 IMU 测量数据,以及创建了一个真实世界的数据集。在评估中,我们的方法表现出更高的准确性和更稳健的估计性能,特别是当用户采用较低的姿势,如蹲下或鞠躬时。