College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan.
Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan.
Sensors (Basel). 2020 Mar 12;20(6):1593. doi: 10.3390/s20061593.
Image based human behavior and activity understanding has been a hot topic in the field of computer vision and multimedia. As an important part, skeleton estimation, which is also called pose estimation, has attracted lots of interests. For pose estimation, most of the deep learning approaches mainly focus on the joint feature. However, the joint feature is not sufficient, especially when the image includes multi-person and the pose is occluded or not fully visible. This paper proposes a novel multi-task framework for the multi-person pose estimation. The proposed framework is developed based on Mask Region-based Convolutional Neural Networks (R-CNN) and extended to integrate the joint feature, body boundary, body orientation and occlusion condition together. In order to further improve the performance of the multi-person pose estimation, this paper proposes to organize the different information in serial multi-task models instead of the widely used parallel multi-task network. The proposed models are trained on the public dataset Common Objects in Context (COCO), which is further augmented by ground truths of body orientation and mutual-occlusion mask. Experiments demonstrate the performance of the proposed method for multi-person pose estimation and body orientation estimation. The proposed method can detect 84.6% of the Percentage of Correct Keypoints (PCK) and has an 83.7% Correct Detection Rate (CDR). Comparisons further illustrate the proposed model can reduce the over-detection compared with other methods.
基于图像的人体行为和活动理解一直是计算机视觉和多媒体领域的热门话题。作为其中的一个重要组成部分,骨骼估计(也称为姿势估计)吸引了很多关注。对于姿势估计,大多数深度学习方法主要关注关节特征。然而,关节特征并不充分,尤其是当图像包含多个人,并且姿势被遮挡或不完全可见时。本文提出了一种新颖的多任务框架用于多人姿势估计。所提出的框架基于掩模区域卷积神经网络(R-CNN)开发,并扩展为集成关节特征、身体边界、身体方向和遮挡条件。为了进一步提高多人姿势估计的性能,本文提出了在串行多任务模型中组织不同信息的方法,而不是广泛使用的并行多任务网络。所提出的模型在公共数据集 Common Objects in Context (COCO) 上进行训练,该数据集通过身体方向和相互遮挡掩模的真值进一步增强。实验证明了所提出的方法在多人姿势估计和身体方向估计方面的性能。所提出的方法可以检测到 84.6%的正确关键点百分比(PCK),并且具有 83.7%的正确检测率(CDR)。比较进一步表明,与其他方法相比,所提出的模型可以减少过度检测。