Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China.
Sensors (Basel). 2019 Feb 10;19(3):718. doi: 10.3390/s19030718.
In recent years, increasing human data comes from image sensors. In this paper, a novel approach combining convolutional pose machines (CPMs) with GoogLeNet is proposed for human pose estimation using image sensor data. The first stage of the CPMs directly generates a response map of each human skeleton's key points from images, in which we introduce some layers from the GoogLeNet. On the one hand, the improved model uses deeper network layers and more complex network structures to enhance the ability of low level feature extraction. On the other hand, the improved model applies a fine-tuning strategy, which benefits the estimation accuracy. Moreover, we introduce the inception structure to greatly reduce parameters of the model, which reduces the convergence time significantly. Extensive experiments on several datasets show that the improved model outperforms most mainstream models in accuracy and training time. The prediction efficiency of the improved model is improved by 1.023 times compared with the CPMs. At the same time, the training time of the improved model is reduced 3.414 times. This paper presents a new idea for future research.
近年来,越来越多的人类数据来自图像传感器。在本文中,提出了一种结合卷积位姿机(CPMs)和 GoogLeNet 的新方法,用于使用图像传感器数据进行人体姿态估计。CPMs 的第一阶段直接从图像中生成每个人体骨骼关键点的响应图,其中引入了一些来自 GoogLeNet 的层。一方面,改进后的模型使用更深的网络层和更复杂的网络结构来增强低级特征提取的能力。另一方面,改进后的模型应用了微调策略,这有利于提高估计精度。此外,我们引入了 inception 结构,大大减少了模型的参数,显著减少了收敛时间。在几个数据集上的广泛实验表明,改进后的模型在准确性和训练时间方面优于大多数主流模型。与 CPMs 相比,改进后的模型的预测效率提高了 1.023 倍。同时,改进后的模型的训练时间减少了 3.414 倍。本文为未来的研究提供了一个新的思路。