Ge Congying, Qin Wei Fu
School of Physical Education, Guangxi University of Science and Technology, Liuzhou, 545006, China.
College of Physical Education, Beibu Gulf University, Qinzhou, 535011, Guangxi, China.
Sci Rep. 2025 Jul 30;15(1):27754. doi: 10.1038/s41598-025-12620-4.
Human pose estimation is a fundamental task in computer vision. However, existing methods face performance fluctuation challenges when processing human targets at different scales, especially in outdoor scenes where target distances and viewing angles frequently change. This paper proposes ScaleFormer, a novel scale-invariant pose estimation framework that effectively addresses multi-scale pose estimation problems by innovatively combining the hierarchical feature extraction capabilities of Swin Transformer with the fine-grained feature enhancement mechanisms of ConvNeXt. We design an adaptive feature representation mechanism that enables the model to maintain consistent performance across different scales. Extensive experiments on the MPII human pose dataset demonstrate that ScaleFormer significantly outperforms existing methods on multiple metrics including PCKh, scale consistency score, and keypoint mean average precision. Notably, under extreme scaling conditions (scaling factor 2.0), ScaleFormer's scale consistency score exceeds the baseline model by 48.8 percentage points. Under 30% random occlusion conditions, keypoint detection accuracy improves by 20.5 percentage points. Experiments further verify the complementary contributions of the two core components. These results indicate that ScaleFormer has significant advantages in practical application scenarios and provides new research directions for the field of pose estimation.
人体姿态估计是计算机视觉中的一项基本任务。然而,现有方法在处理不同尺度的人体目标时面临性能波动挑战,尤其是在目标距离和视角频繁变化的室外场景中。本文提出了ScaleFormer,这是一种新颖的尺度不变姿态估计框架,通过创新性地将Swin Transformer的分层特征提取能力与ConvNeXt的细粒度特征增强机制相结合,有效解决了多尺度姿态估计问题。我们设计了一种自适应特征表示机制,使模型能够在不同尺度上保持一致的性能。在MPII人体姿态数据集上进行的大量实验表明,ScaleFormer在包括PCKh、尺度一致性分数和关键点平均精度在内的多个指标上显著优于现有方法。值得注意的是,在极端缩放条件下(缩放因子为2.0),ScaleFormer的尺度一致性分数比基线模型高出48.8个百分点。在30%随机遮挡条件下,关键点检测准确率提高了20.5个百分点。实验进一步验证了两个核心组件的互补贡献。这些结果表明,ScaleFormer在实际应用场景中具有显著优势,并为姿态估计领域提供了新的研究方向。