Nanjing University of Aeronautics and Astronautics, Nanjing, 210000, Jiangsu, China.
Sci Rep. 2024 Apr 5;14(1):8012. doi: 10.1038/s41598-024-58146-z.
The objective of human pose estimation (HPE) derived from deep learning aims to accurately estimate and predict the human body posture in images or videos via the utilization of deep neural networks. However, the accuracy of real-time HPE tasks is still to be improved due to factors such as partial occlusion of body parts and limited receptive field of the model. To alleviate the accuracy loss caused by these issues, this paper proposes a real-time HPE model called based on the YOLOv8 framework. Specifically, we have improved the backbone and neck of the YOLOv8x-pose real-time HPE model to alleviate the feature loss and receptive field constraints. Secondly, we introduce the context coordinate attention module (CCAM) to augment the model's focus on salient features, reduce background noise interference, alleviate key point regression failure caused by limb occlusion, and improve the accuracy of pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, CCAM-Person improves the average precision by 2.8% and 3.5% on the two datasets, respectively.
基于深度学习的人体姿态估计(HPE)旨在通过使用深度神经网络准确估计和预测图像或视频中的人体姿势。然而,由于部分遮挡身体部位和模型的有限感受野等因素,实时 HPE 任务的准确性仍有待提高。为了缓解这些问题造成的准确性损失,本文提出了一种基于 YOLOv8 框架的实时 HPE 模型,称为。具体来说,我们改进了 YOLOv8x-pose 实时 HPE 模型的骨干和颈部,以减轻特征损失和感受野限制。其次,我们引入了上下文坐标注意力模块(CCAM),以增强模型对显著特征的关注,减少背景噪声干扰,缓解由于肢体遮挡导致的关键点回归失败,并提高姿态估计的准确性。我们的方法在两个开源数据集 MS COCO 2017 和 CrowdPose 的多个指标上都取得了有竞争力的结果。与基线模型 YOLOv8x-pose 相比,CCAM-Person 在这两个数据集上的平均精度分别提高了 2.8%和 3.5%。