CONet：用于遮挡人体姿态估计的人群和遮挡感知网络。

CONet: Crowd and occlusion-aware network for occluded human pose estimation.

机构信息

School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

出版信息

Neural Netw. 2024 Apr;172:106109. doi: 10.1016/j.neunet.2024.106109. Epub 2024 Jan 9.

DOI:10.1016/j.neunet.2024.106109

Abstract

Human pose estimation has numerous applications in motion recognition, virtual reality, human-computer interaction, and other related fields. However, multi-person pose estimation in crowded and occluded scenes is challenging. One major issue about the current top-down human pose estimation approaches is that they are limited to predicting the pose of a single person, even when the bounding box contains multiple individuals. To address this problem, we propose a novel Crowd and Occlusion-aware Network (CONet) using a divide-and-conquer strategy. Our approach includes a Crowd and Occlusion-aware Head (COHead) which estimates the pose of both the occluder and the occluded person using two separate branches. We also use the attention mechanism to guide the branches for differentiated learning, aiming to improve feature representation. Additionally, we propose a novel interference point loss to enhance the model's anti-interference ability. Our CONet is simple yet effective, and it outperforms the state-of-the-art model by +1.6 AP, achieving 71.6 AP on CrowdPose. Our proposed model has achieved state-of-the-art results on the CrowdPose dataset, demonstrating its effectiveness in improving the accuracy of human pose estimation in crowded and occluded scenes. This achievement highlights the potential of our model in many real-world applications where accurate human pose estimation is crucial, such as surveillance, sports analysis, and human-computer interaction.

摘要

人体姿态估计在运动识别、虚拟现实、人机交互和其他相关领域有广泛的应用。然而，在拥挤和遮挡的场景中进行多人姿态估计是具有挑战性的。目前自上而下的人体姿态估计方法的一个主要问题是，它们仅限于预测单个个体的姿态，即使边界框包含多个个体也是如此。为了解决这个问题，我们提出了一种新的基于分而治之策略的人群和遮挡感知网络（CONet）。我们的方法包括一个人群和遮挡感知头（COHead），它使用两个独立的分支来估计遮挡者和被遮挡者的姿态。我们还使用注意力机制来指导分支进行差异化学习，旨在提高特征表示。此外，我们提出了一种新的干扰点损失来增强模型的抗干扰能力。我们的 CONet 简单而有效，在 CrowdPose 数据集上的性能比最先进的模型提高了+1.6 AP，达到了 71.6 AP。我们提出的模型在 CrowdPose 数据集上取得了最先进的结果，证明了它在提高拥挤和遮挡场景中人体姿态估计精度方面的有效性。这一成就突显了我们的模型在许多需要准确人体姿态估计的实际应用中的潜力，例如监控、运动分析和人机交互。