Lv Chao, Ma Geyao
College of Electronic Information Engineering, Changchun University of Science and Technology, Changchun, China.
PLoS One. 2025 Jun 25;20(6):e0326232. doi: 10.1371/journal.pone.0326232. eCollection 2025.
Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints' regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network's feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy.
人体姿态估计(HPE)在深度学习的推动下取得了显著进展;然而,它在处理遮挡、复杂姿态和复杂多人场景方面仍面临挑战。为了解决这些问题,我们提出了PoseNet++,这是一种基于三堆叠沙漏架构的新颖方法,融合了三项关键创新:多尺度空间金字塔注意力沙漏模块(MSPAHM)、坐标通道先验卷积注意力(C-CPCA)和PinSK瓶颈残差模块(PBRM)。MSPAHM增强了远程通道依赖性,使模型能够更好地捕捉肢体关节之间的结构关系,尤其是在遮挡情况下。C-CPCA将坐标注意力(CA)和通道先验卷积注意力(CPCA)相结合,以优先处理关键点区域并减少复杂多人场景中的混淆。PBRM通过优化感受野和卷积核选择提高姿态估计精度,从而增强网络在多尺度和复杂姿态下的特征提取能力。在MPII验证集上,PoseNet++相对于基线三堆叠沙漏网络将PCKh分数提高了3.3%,同时分别将模型参数数量和浮点运算数量减少了60.3%和53.1%。与近年来其他主流人体姿态估计模型相比,PoseNet++在MPII、LSP、COCO和CrowdPose数据集上实现了最优性能。同时,PoseNet++的模型复杂度远低于具有相似精度的方法。