School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China.
Sensors (Basel). 2024 Sep 26;24(19):6249. doi: 10.3390/s24196249.
Two-dimensional human pose estimation aims to equip computers with the ability to accurately recognize human keypoints and comprehend their spatial contexts within media content. However, the accuracy of real-time human pose estimation diminishes when processing images with occluded body parts or overlapped individuals. To address these issues, we propose a method based on the YOLO framework. We integrate the convolutional concepts of Kolmogorov-Arnold Networks (KANs) through introducing non-linear activation functions to enhance the feature extraction capabilities of the convolutional kernels. Moreover, to improve the detection of small target keypoints, we integrate the cross-stage partial (CSP) approach and utilize the small object enhance pyramid (SOEP) module for feature integration. We also innovatively incorporate a layered shared convolution with batch normalization detection head (LSCB), consisting of multiple shared convolutional layers and batch normalization layers, to enable cross-stage feature fusion and address the low utilization of model parameters. Given the structure and purpose of the proposed model, we name it KSL-POSE. Compared to the baseline model YOLOv8l-POSE, KSL-POSE achieves significant improvements, increasing the average detection accuracy by 1.5% on the public MS COCO 2017 data set. Furthermore, the model also demonstrates competitive performance on the CrowdPOSE data set, thus validating its generalization ability.
二维人体姿态估计旨在使计算机具备准确识别人体关键点并理解媒体内容中人体空间关系的能力。然而,在处理带有遮挡身体部位或重叠个体的图像时,实时人体姿态估计的准确性会降低。为了解决这些问题,我们提出了一种基于 YOLO 框架的方法。我们通过引入非线性激活函数,将 Kolmogorov-Arnold 网络(KAN)的卷积概念集成到卷积核中,以增强卷积核的特征提取能力。此外,为了提高小目标关键点的检测能力,我们集成了跨阶段部分(CSP)方法,并使用小目标增强金字塔(SOEP)模块进行特征融合。我们还创新性地采用了分层共享卷积和批量归一化检测头(LSCB),它由多个共享卷积层和批量归一化层组成,以实现跨阶段特征融合,并解决模型参数利用率低的问题。考虑到所提出模型的结构和目的,我们将其命名为 KSL-POSE。与基线模型 YOLOv8l-POSE 相比,KSL-POSE 在公共 MS COCO 2017 数据集上的平均检测精度提高了 1.5%。此外,该模型在 CrowdPOSE 数据集上也表现出了竞争性能,从而验证了其泛化能力。