School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
IEEE Trans Image Process. 2011 Apr;20(4):1141-51. doi: 10.1109/TIP.2010.2076820. Epub 2010 Sep 16.
Human pose estimation via motion tracking systems can be considered as a regression problem within a discriminative framework. It is always a challenging task to model the mapping from observation space to state space because of the high-dimensional characteristic in the multimodal conditional distribution. In order to build the mapping, existing techniques usually involve a large set of training samples in the learning process which are limited in their capability to deal with multimodality. We propose, in this work, a novel online sparse Gaussian Process (GP) regression model to recover 3-D human motion in monocular videos. Particularly, we investigate the fact that for a given test input, its output is mainly determined by the training samples potentially residing in its local neighborhood and defined in the unified input-output space. This leads to a local mixture GP experts system composed of different local GP experts, each of which dominates a mapping behavior with the specific covariance function adapting to a local region. To handle the multimodality, we combine both temporal and spatial information therefore to obtain two categories of local experts. The temporal and spatial experts are integrated into a seamless hybrid system, which is automatically self-initialized and robust for visual tracking of nonlinear human motion. Learning and inference are extremely efficient as all the local experts are defined online within very small neighborhoods. Extensive experiments on two real-world databases, HumanEva and PEAR, demonstrate the effectiveness of our proposed model, which significantly improve the performance of existing models.
基于运动跟踪系统的人体姿态估计可以被视为判别框架内的回归问题。由于多模态条件分布的高维特征,很难建立从观测空间到状态空间的映射。为了构建映射,现有技术通常在学习过程中涉及大量训练样本,但它们在处理多模态方面的能力有限。我们在这项工作中提出了一种新的在线稀疏高斯过程(GP)回归模型,用于恢复单目视频中的 3D 人体运动。具体来说,我们研究了这样一个事实,即对于给定的测试输入,其输出主要由潜在存在于其局部邻域内的训练样本决定,这些样本在统一的输入-输出空间中定义。这导致了一个由不同的局部 GP 专家组成的局部混合 GP 专家系统,每个专家都具有特定的协方差函数来主导特定的映射行为,该函数适应于局部区域。为了处理多模态性,我们同时利用了时间和空间信息,因此获得了两类局部专家。时间和空间专家被集成到一个无缝的混合系统中,该系统自动初始化并且对非线性人体运动的视觉跟踪具有鲁棒性。学习和推理的效率非常高,因为所有的局部专家都在非常小的邻域内在线定义。在 HumanEva 和 PEAR 两个真实数据库上的广泛实验表明了我们提出的模型的有效性,该模型显著提高了现有模型的性能。