The State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun, China.
PLoS One. 2022 Sep 13;17(9):e0274450. doi: 10.1371/journal.pone.0274450. eCollection 2022.
3D human pose estimation has always been an important task in computer vision, especially in crowded scenes where multiple people interact with each other. There are many state-of-the-arts for object detection based on single view. However, recovering the location of people is complicated in crowded and occluded scenes due to the lack of depth information for single view, which is the lack of robustness. Multi-view Human Pose Estimation for Multi-Person became an effective approach. The previous multi-view 3D human pose estimation method can be attributed to a strategy to associate the joints of the same person from 2D pose estimation. However, the incompleteness and noise of the 2D pose are inevitable. In addition, how to associate the joints itself is challenging. To solve this issue, we propose a CTP (Center Point to Pose) network based on multi-view which directly operates in the 3D space. The 2D joint features in all cameras are projected into 3D voxel space. Our CTP network regresses the center of one person as the location, and the 3D bounding box as the activity area of one person. Then our CTP network estimates detailed 3D pose for each bounding box. Besides, our CTP network is Non-Maximum Suppression free at the stage of regressing the center of one person, which makes it more efficient and simpler. Our method outperforms competitively on several public datasets which shows the efficacy of our center point to pose network representation.
三维人体姿态估计一直是计算机视觉中的一个重要任务,特别是在多人相互作用的拥挤场景中。基于单视图的目标检测有许多最新技术。然而,由于单视图缺乏深度信息,在拥挤和遮挡的场景中恢复人的位置变得复杂,这缺乏鲁棒性。多人多视图人体姿态估计成为一种有效的方法。以前的多视图 3D 人体姿态估计方法可以归因于一种从 2D 姿态估计关联同一人关节的策略。然而,2D 姿态的不完整性和噪声是不可避免的。此外,如何关联关节本身也是具有挑战性的。为了解决这个问题,我们提出了一种基于多视图的 CTP(中心点到姿态)网络,它直接在 3D 空间中操作。所有摄像机中的 2D 关节特征被投影到 3D 体素空间中。我们的 CTP 网络回归一个人的中心点作为位置,3D 边界框作为一个人的活动区域。然后,我们的 CTP 网络为每个边界框估计详细的 3D 姿态。此外,我们的 CTP 网络在回归一个人的中心点的阶段不进行非极大值抑制,这使其更加高效和简单。我们的方法在几个公共数据集上的表现优于竞争对手,这表明了我们的中心点到姿态网络表示的有效性。