IEEE Trans Pattern Anal Mach Intell. 2016 Oct;38(10):1929-42. doi: 10.1109/TPAMI.2015.2509986. Epub 2015 Dec 17.
We address the problem of 3D pose estimation of multiple humans from multiple views. The transition from single to multiple human pose estimation and from the 2D to 3D space is challenging due to a much larger state space, occlusions and across-view ambiguities when not knowing the identity of the humans in advance. To address these problems, we first create a reduced state space by triangulation of corresponding pairs of body parts obtained by part detectors for each camera view. In order to resolve ambiguities of wrong and mixed parts of multiple humans after triangulation and also those coming from false positive detections, we introduce a 3D pictorial structures (3DPS) model. Our model builds on multi-view unary potentials, while a prior model is integrated into pairwise and ternary potential functions. To balance the potentials' influence, the model parameters are learnt using a Structured SVM (SSVM). The model is generic and applicable to both single and multiple human pose estimation. To evaluate our model on single and multiple human pose estimation, we rely on four different datasets. We first analyse the contribution of the potentials and then compare our results with related work where we demonstrate superior performance.
我们解决了从多个视角对多个人体的 3D 姿态估计问题。由于状态空间更大、存在遮挡和跨视角歧义(如果事先不知道人体的身份),从单人姿态估计到多人姿态估计以及从 2D 空间到 3D 空间的转变具有挑战性。为了解决这些问题,我们首先通过为每个摄像机视图的身体部分检测器获得的对应对进行三角剖分,创建一个简化的状态空间。为了解决三角剖分后多个人体的错误和混合部分以及来自假阳性检测的歧义,我们引入了一个 3D 图像结构(3DPS)模型。我们的模型基于多视图一元势,同时将先验模型集成到对势和三元势函数中。为了平衡势的影响,使用结构化 SVM(SSVM)学习模型参数。该模型是通用的,适用于单人及多人姿态估计。为了在单人及多人姿态估计上评估我们的模型,我们依赖于四个不同的数据集。我们首先分析势的贡献,然后将我们的结果与相关工作进行比较,证明了我们的模型具有更好的性能。