• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于具有增强混合特征的尺度不变人体姿态估计的ScaleFormer架构。

ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features.

作者信息

Ge Congying, Qin Wei Fu

机构信息

School of Physical Education, Guangxi University of Science and Technology, Liuzhou, 545006, China.

College of Physical Education, Beibu Gulf University, Qinzhou, 535011, Guangxi, China.

出版信息

Sci Rep. 2025 Jul 30;15(1):27754. doi: 10.1038/s41598-025-12620-4.

DOI:10.1038/s41598-025-12620-4
PMID:40739115
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12311106/
Abstract

Human pose estimation is a fundamental task in computer vision. However, existing methods face performance fluctuation challenges when processing human targets at different scales, especially in outdoor scenes where target distances and viewing angles frequently change. This paper proposes ScaleFormer, a novel scale-invariant pose estimation framework that effectively addresses multi-scale pose estimation problems by innovatively combining the hierarchical feature extraction capabilities of Swin Transformer with the fine-grained feature enhancement mechanisms of ConvNeXt. We design an adaptive feature representation mechanism that enables the model to maintain consistent performance across different scales. Extensive experiments on the MPII human pose dataset demonstrate that ScaleFormer significantly outperforms existing methods on multiple metrics including PCKh, scale consistency score, and keypoint mean average precision. Notably, under extreme scaling conditions (scaling factor 2.0), ScaleFormer's scale consistency score exceeds the baseline model by 48.8 percentage points. Under 30% random occlusion conditions, keypoint detection accuracy improves by 20.5 percentage points. Experiments further verify the complementary contributions of the two core components. These results indicate that ScaleFormer has significant advantages in practical application scenarios and provides new research directions for the field of pose estimation.

摘要

人体姿态估计是计算机视觉中的一项基本任务。然而,现有方法在处理不同尺度的人体目标时面临性能波动挑战,尤其是在目标距离和视角频繁变化的室外场景中。本文提出了ScaleFormer,这是一种新颖的尺度不变姿态估计框架,通过创新性地将Swin Transformer的分层特征提取能力与ConvNeXt的细粒度特征增强机制相结合,有效解决了多尺度姿态估计问题。我们设计了一种自适应特征表示机制,使模型能够在不同尺度上保持一致的性能。在MPII人体姿态数据集上进行的大量实验表明,ScaleFormer在包括PCKh、尺度一致性分数和关键点平均精度在内的多个指标上显著优于现有方法。值得注意的是,在极端缩放条件下(缩放因子为2.0),ScaleFormer的尺度一致性分数比基线模型高出48.8个百分点。在30%随机遮挡条件下,关键点检测准确率提高了20.5个百分点。实验进一步验证了两个核心组件的互补贡献。这些结果表明,ScaleFormer在实际应用场景中具有显著优势,并为姿态估计领域提供了新的研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/825578be4193/41598_2025_12620_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/d4b7ba63ada0/41598_2025_12620_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/cbad44892771/41598_2025_12620_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/1d24b0aaade5/41598_2025_12620_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/50bb8afff2aa/41598_2025_12620_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/7003c8b1d028/41598_2025_12620_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/6a95c4adbd23/41598_2025_12620_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/34f7a6afb80b/41598_2025_12620_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/c3239c01e0e4/41598_2025_12620_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/012ce2e916df/41598_2025_12620_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/88d749b39c08/41598_2025_12620_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/4f69783d16e5/41598_2025_12620_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/365d3ce78f33/41598_2025_12620_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/825578be4193/41598_2025_12620_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/d4b7ba63ada0/41598_2025_12620_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/cbad44892771/41598_2025_12620_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/1d24b0aaade5/41598_2025_12620_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/50bb8afff2aa/41598_2025_12620_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/7003c8b1d028/41598_2025_12620_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/6a95c4adbd23/41598_2025_12620_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/34f7a6afb80b/41598_2025_12620_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/c3239c01e0e4/41598_2025_12620_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/012ce2e916df/41598_2025_12620_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/88d749b39c08/41598_2025_12620_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/4f69783d16e5/41598_2025_12620_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/365d3ce78f33/41598_2025_12620_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4685/12311106/825578be4193/41598_2025_12620_Fig12_HTML.jpg

相似文献

1
ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features.用于具有增强混合特征的尺度不变人体姿态估计的ScaleFormer架构。
Sci Rep. 2025 Jul 30;15(1):27754. doi: 10.1038/s41598-025-12620-4.
2
PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.PoseNet++:一种用于高精度人体姿态估计的多尺度优化特征提取网络。
PLoS One. 2025 Jun 25;20(6):e0326232. doi: 10.1371/journal.pone.0326232. eCollection 2025.
3
Enhanced Pose Estimation for Badminton Players via Improved YOLOv8-Pose with Efficient Local Attention.通过具有高效局部注意力的改进YOLOv8姿态估计对羽毛球运动员进行增强姿态估计
Sensors (Basel). 2025 Jul 17;25(14):4446. doi: 10.3390/s25144446.
4
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
5
Facial Landmark-Driven Keypoint Feature Extraction for Robust Facial Expression Recognition.用于鲁棒面部表情识别的面部地标驱动关键点特征提取
Sensors (Basel). 2025 Jun 16;25(12):3762. doi: 10.3390/s25123762.
6
Unsupervised retinal image registration based on D-STUNet and progressive keypoint screening strategy.基于D-STUNet和渐进式关键点筛选策略的无监督视网膜图像配准
Biomed Phys Eng Express. 2025 Jul 9;11(4). doi: 10.1088/2057-1976/ade9c6.
7
Multi-level channel-spatial attention and light-weight scale-fusion network (MCSLF-Net): multi-level channel-spatial attention and light-weight scale-fusion transformer for 3D brain tumor segmentation.多级通道空间注意力与轻量级尺度融合网络(MCSLF-Net):用于3D脑肿瘤分割的多级通道空间注意力与轻量级尺度融合变换器
Quant Imaging Med Surg. 2025 Jul 1;15(7):6301-6325. doi: 10.21037/qims-2025-354. Epub 2025 Jun 30.
8
DASNet a dual branch multi level attention sheep counting network.DASNet是一种双分支多级注意力羊只计数网络。
Sci Rep. 2025 Jul 2;15(1):23228. doi: 10.1038/s41598-025-97929-w.
9
Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination.基于梯度协调的非独立同分布数据上人体姿态估计的联邦学习
Sensors (Basel). 2025 Jul 12;25(14):4372. doi: 10.3390/s25144372.
10
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

本文引用的文献

1
Real-time monitoring of gradient chromatography using dual Kalman-filters.
J Chromatogr A. 2024 Aug 30;1731:465161. doi: 10.1016/j.chroma.2024.465161. Epub 2024 Jul 14.
2
RNNPose: 6-DoF Object Pose Estimation via Recurrent Correspondence Field Estimation and Pose Optimization.RNNPose:通过循环对应场估计和姿态优化实现的6自由度物体姿态估计
IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4669-4683. doi: 10.1109/TPAMI.2024.3360181. Epub 2024 Jun 5.
3
CONet: Crowd and occlusion-aware network for occluded human pose estimation.CONet:用于遮挡人体姿态估计的人群和遮挡感知网络。
Neural Netw. 2024 Apr;172:106109. doi: 10.1016/j.neunet.2024.106109. Epub 2024 Jan 9.
4
Convolutional Cross-View Pose Estimation.卷积跨视图姿态估计
IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3813-3831. doi: 10.1109/TPAMI.2023.3346924. Epub 2024 Apr 3.
5
PosturePose: Optimized Posture Analysis for Semi-Supervised Monocular 3D Human Pose Estimation.姿势姿态:用于半监督单目 3D 人体姿态估计的优化姿势分析。
Sensors (Basel). 2023 Dec 11;23(24):9749. doi: 10.3390/s23249749.