• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

曼巴姿态:一种基于门控前馈网络和曼巴的人体姿态估计方法

MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba.

作者信息

Zhang Jianqiang, Hou Jing, He Qiusheng, Yuan Zhengwei, Xue Hao

机构信息

School of Electronic Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China.

College of Modern Urban Construction Industry, Tianjin Chengjian University, Tianjin 300384, China.

出版信息

Sensors (Basel). 2024 Dec 20;24(24):8158. doi: 10.3390/s24248158.

DOI:10.3390/s24248158
PMID:39771893
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11679066/
Abstract

Human pose estimation is an important research direction in the field of computer vision, which aims to accurately identify the position and posture of keypoints of the human body through images or videos. However, multi-person pose estimation yields false detection or missed detection in dense crowds, and it is still difficult to detect small targets. In this paper, we propose a Mamba-based human pose estimation. First, we design a GMamba structure to be used as a backbone network to extract human keypoints. A gating mechanism is introduced into the linear layer of Mamba, which allows the model to dynamically adjust the weights according to the different input images to locate the human keypoints more precisely. Secondly, GMamba as the backbone network can effectively solve the long-sequence problem. The direct use of convolutional downsampling reduces selectivity for different stages of information flow. We used slice downsampling (SD) to reduce the resolution of the feature map to half the original size, and then fused local features from four different locations. The fusion of multi-channel information helped the model obtain rich pose information. Finally, we introduced an adaptive threshold focus loss (ATFL) to dynamically adjust the weights of different keypoints. We assigned higher weights to error-prone keypoints to strengthen the model's attention to these points. Thus, we effectively improved the accuracy of keypoint identification in cases of occlusion, complex background, etc., and significantly improved the overall performance of attitude estimation and anti-interference ability. Experimental results showed that the AP and AP50 of the proposed algorithm on the COCO 2017 validation set were 72.2 and 92.6. Compared with the typical algorithm, it was improved by 1.1% on AP50. The proposed method can effectively detect the keypoints of the human body, and provides stronger robustness and accuracy for the estimation of human posture in complex scenes.

摘要

人体姿态估计是计算机视觉领域的一个重要研究方向,其目的是通过图像或视频准确识别人体关键点的位置和姿态。然而,多人姿态估计在密集人群中会产生误检或漏检,并且检测小目标仍然困难。在本文中,我们提出了一种基于曼巴的人体姿态估计方法。首先,我们设计了一种GMamba结构作为骨干网络来提取人体关键点。在曼巴的线性层中引入了一种门控机制,使模型能够根据不同的输入图像动态调整权重,从而更精确地定位人体关键点。其次,GMamba作为骨干网络能够有效解决长序列问题。直接使用卷积下采样会降低对信息流不同阶段的选择性。我们使用切片下采样(SD)将特征图的分辨率降低到原来的一半,然后融合来自四个不同位置的局部特征。多通道信息的融合有助于模型获得丰富的姿态信息。最后,我们引入了自适应阈值焦点损失(ATFL)来动态调整不同关键点的权重。我们给易错的关键点分配更高的权重,以加强模型对这些点的关注。因此,我们有效提高了在遮挡、复杂背景等情况下关键点识别的准确率,并显著提高了姿态估计的整体性能和抗干扰能力。实验结果表明,所提算法在COCO 2017验证集上的AP和AP50分别为72.2和92.6。与典型算法相比,AP50提高了1.1%。所提方法能够有效检测人体关键点,并为复杂场景下的人体姿态估计提供更强的鲁棒性和准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/ba56bc5aabb9/sensors-24-08158-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/cae288067e99/sensors-24-08158-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/82d6325ce233/sensors-24-08158-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/9ded3cf51908/sensors-24-08158-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/f92dc3a508e6/sensors-24-08158-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/88c89ad66d28/sensors-24-08158-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/6f33701497f6/sensors-24-08158-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/909ff0819238/sensors-24-08158-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/73a3cf4a9388/sensors-24-08158-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/b876745cb955/sensors-24-08158-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/ba56bc5aabb9/sensors-24-08158-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/cae288067e99/sensors-24-08158-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/82d6325ce233/sensors-24-08158-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/9ded3cf51908/sensors-24-08158-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/f92dc3a508e6/sensors-24-08158-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/88c89ad66d28/sensors-24-08158-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/6f33701497f6/sensors-24-08158-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/909ff0819238/sensors-24-08158-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/73a3cf4a9388/sensors-24-08158-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/b876745cb955/sensors-24-08158-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7295/11679066/ba56bc5aabb9/sensors-24-08158-g010.jpg

相似文献

1
MambaPose: A Human Pose Estimation Based on Gated Feedforward Network and Mamba.曼巴姿态:一种基于门控前馈网络和曼巴的人体姿态估计方法
Sensors (Basel). 2024 Dec 20;24(24):8158. doi: 10.3390/s24248158.
2
DSPose: Dual-Space-Driven Keypoint Topology Modeling for Human Pose Estimation.DSPose:用于人体姿态估计的双空间驱动关键点拓扑建模。
Sensors (Basel). 2023 Sep 3;23(17):7626. doi: 10.3390/s23177626.
3
KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework.KSL-POSE:一种基于改进 YOLOv8-Pose 框架的实时 2D 人体姿态估计方法。
Sensors (Basel). 2024 Sep 26;24(19):6249. doi: 10.3390/s24196249.
4
A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.一种基于YOLOv8框架的人体姿态估计网络,具有高效的多尺度感受野和扩展的特征金字塔网络。
Sci Rep. 2025 May 1;15(1):15284. doi: 10.1038/s41598-025-00259-0.
5
LCFFNet: A Lightweight Cross-scale Feature Fusion Network for human pose estimation.LCFFNet:一种用于人体姿态估计的轻量级跨尺度特征融合网络。
Neural Netw. 2025 Mar;183:106959. doi: 10.1016/j.neunet.2024.106959. Epub 2024 Dec 4.
6
A lightweight Yunnan Xiaomila detection and pose estimation based on improved YOLOv8.一种基于改进YOLOv8的轻量化云南小米辣检测与姿态估计
Front Plant Sci. 2024 Jun 5;15:1421381. doi: 10.3389/fpls.2024.1421381. eCollection 2024.
7
Multi-Person Pose Estimation Using an Orientation and Occlusion Aware Deep Learning Network.基于方向和遮挡感知的深度学习网络的多人姿态估计。
Sensors (Basel). 2020 Mar 12;20(6):1593. doi: 10.3390/s20061593.
8
Research on Human Posture Estimation Algorithm Based on YOLO-Pose.基于 YOLO-Pose 的人体姿态估计算法研究。
Sensors (Basel). 2024 May 10;24(10):3036. doi: 10.3390/s24103036.
9
An enhanced real-time human pose estimation method based on modified YOLOv8 framework.基于改进 YOLOv8 框架的增强实时人体姿态估计方法。
Sci Rep. 2024 Apr 5;14(1):8012. doi: 10.1038/s41598-024-58146-z.
10
Repeated Cross-Scale Structure-Induced Feature Fusion Network for 2D Hand Pose Estimation.用于二维手部姿态估计的重复跨尺度结构诱导特征融合网络
Entropy (Basel). 2023 Apr 27;25(5):724. doi: 10.3390/e25050724.

本文引用的文献

1
KSL-POSE: A Real-Time 2D Human Pose Estimation Method Based on Modified YOLOv8-Pose Framework.KSL-POSE:一种基于改进 YOLOv8-Pose 框架的实时 2D 人体姿态估计方法。
Sensors (Basel). 2024 Sep 26;24(19):6249. doi: 10.3390/s24196249.
2
An efficient and accurate 2D human pose estimation method using VTTransPose network.基于 VTTransPose 网络的高效准确 2D 人体姿态估计方法。
Sci Rep. 2024 Mar 31;14(1):7608. doi: 10.1038/s41598-024-58175-8.
3
CNN-CNN: Dual Convolutional Neural Network Approach for Feature Selection and Attack Detection on Internet of Things Networks.
CNN-CNN:用于物联网网络特征选择和攻击检测的双卷积神经网络方法。
Sensors (Basel). 2023 Jul 19;23(14):6507. doi: 10.3390/s23146507.
4
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields.OpenPose:基于部件亲和力字段的实时多人 2D 姿态估计。
IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):172-186. doi: 10.1109/TPAMI.2019.2929257. Epub 2020 Dec 4.
5
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.