Tian Wei, Gao Zhong, Tan Dayi
Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University, Shanghai, China.
Front Neurosci. 2023 Jul 19;17:1201088. doi: 10.3389/fnins.2023.1201088. eCollection 2023.
Vision-based human pose estimation has been widely applied in tasks such as augmented reality, action recognition and human-machine interaction. Current approaches favor the keypoint detection-based paradigm, as it eases the learning by circumventing the highly non-linear problem of direct regressing keypoint coordinates. However, in such a paradigm, each keypoint is predicted based on its small surrounding region in a Gaussian-like heatmap, resulting in a huge waste of information from the rest regions and even limiting the model optimization. In this paper, we design a new k-block multi-person pose estimation architecture with a voting mechanism on the entire heatmap to simultaneously infer the key points and their uncertainties. To further improve the keypoint estimation, this architecture leverages the SMPL 3D human body model, and iteratively mines the information of human body structure to correct the pose estimation from a single image. By experiments on the 3DPW dataset, it improves the state-of-the-art performance by about 8 mm on MPJPE metric and 5 mm on PA-MPJPE metric. Furthermore, its capability to be employed in real-time provides potential applications for multi-person pose estimation to be conducted in complex scenarios.
基于视觉的人体姿态估计已广泛应用于增强现实、动作识别和人机交互等任务中。当前的方法倾向于基于关键点检测的范式,因为它通过规避直接回归关键点坐标这一高度非线性问题来简化学习过程。然而,在这种范式中,每个关键点是基于高斯热图中其小的周围区域进行预测的,这导致其余区域的信息被大量浪费,甚至限制了模型优化。在本文中,我们设计了一种新的k块多人姿态估计架构,该架构在整个热图上采用投票机制来同时推断关键点及其不确定性。为了进一步改进关键点估计,该架构利用SMPL 3D人体模型,并迭代挖掘人体结构信息以从单张图像中校正姿态估计。通过在3DPW数据集上进行实验,在MPJPE指标上它将当前最优性能提高了约8毫米,在PA - MPJPE指标上提高了5毫米。此外,其能够实时应用为在复杂场景中进行多人姿态估计提供了潜在应用。