一种基于YOLOv8框架的人体姿态估计网络，具有高效的多尺度感受野和扩展的特征金字塔网络。

A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.

作者信息

Cai Shaobin, Xu Han, Cai Wanchen, Mo Yuchang, Wei Liansuo

机构信息

College of Information Engineering, Huzhou University, Huzhou, China.

Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China.

出版信息

Sci Rep. 2025 May 1;15(1):15284. doi: 10.1038/s41598-025-00259-0.

DOI:10.1038/s41598-025-00259-0

PMID:40312474

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12046037/

Abstract

Deep neural networks are used to accurately detect, estimate, and predict human body poses in images or videos through deep learning-based human pose estimation. However, traditional multi-person pose estimation methods face challenges due to partial occlusions and overlaps between multiple human bodies and body parts. To address these issues, we propose EE-YOLOv8, a human pose estimation network based on the YOLOv8 framework, which integrates Efficient Multi-scale Receptive Field (EMRF) and Expanded Feature Pyramid Network (EFPN). First, the EMRF module is employed to further enhance the model's feature representation capability. Second, the EFPN optimizes cross-level information exchange and improves multi-scale data integration. Finally, Wise-IoU replaces the traditional Intersection over Union (IoU) to improve detection accuracy through precise overlap measurement between predicted and ground-truth bounding boxes. We evaluate EE-YOLOv8 on the MS COCO 2017 dataset. Compared to YOLOv8-Pose, EE-YOLOv8 achieves an AP of 89.0% at an IoU threshold of 0.5 (an improvement of 3.3%) and an AP of 65.6% over the IoU range of 0.5-0.95 (an improvement of 5.8%). Therefore, EE-YOLOv8 achieves the highest accuracy while maintaining the lowest parameter count and computational complexity among all analyzed algorithms. These results demonstrate that EE-YOLOv8 exhibits superior competitiveness compared to other mainstream methods.

摘要

深度神经网络用于通过基于深度学习的人体姿态估计来准确检测、估计和预测图像或视频中的人体姿态。然而，传统的多人姿态估计方法由于多个人体及其身体部位之间的部分遮挡和重叠而面临挑战。为了解决这些问题，我们提出了EE-YOLOv8，这是一种基于YOLOv8框架的人体姿态估计网络，它集成了高效多尺度感受野（EMRF）和扩展特征金字塔网络（EFPN）。首先，使用EMRF模块进一步增强模型的特征表示能力。其次，EFPN优化跨层信息交换并改善多尺度数据集成。最后，Wise-IoU取代传统的交并比（IoU），通过预测边界框与真实边界框之间的精确重叠测量来提高检测精度。我们在MS COCO 2017数据集上评估了EE-YOLOv8。与YOLOv8-Pose相比，EE-YOLOv8在IoU阈值为0.5时达到了89.0%的平均精度（提高了3.3%），在IoU范围为0.5-0.95时达到了65.6%的平均精度（提高了5.8%）。因此，在所有分析算法中，EE-YOLOv8在保持最低参数数量和计算复杂度的同时实现了最高精度。这些结果表明，与其他主流方法相比，EE-YOLOv8具有卓越的竞争力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种基于YOLOv8框架的人体姿态估计网络，具有高效的多尺度感受野和扩展的特征金字塔网络。

A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

一种基于YOLOv8框架的人体姿态估计网络，具有高效的多尺度感受野和扩展的特征金字塔网络。

A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.

作者信息

机构信息

出版信息

相似文献

本文引用的文献