Liu Xin, Shi Henglin, Hong Xiaopeng, Chen Haoyu, Tao Dacheng, Zhao Guoying
IEEE Trans Image Process. 2020 Feb 21. doi: 10.1109/TIP.2020.2974061.
Temporal dynamics is an open issue for modeling human body gestures. A solution is resorting to the generative models, such as the hidden Markov model (HMM). Nevertheless, most of the work assumes fixed anchors for each hidden state, which make it hard to describe the explicit temporal structure of gestures. Based on the observation that a gesture is a time series with distinctly defined phases, we propose a new formulation to build temporal compositions of gestures by the low-rank matrix decomposition. The only assumption is that the gesture's "hold" phases with static poses are linearly correlated among each other. As such, a gesture sequence could be segmented into temporal states with semantically meaningful and discriminative concepts. Furthermore, different to traditional HMMs which tend to use specific distance metric for clustering and ignore the temporal contextual information when estimating the emission probability, we utilize the long short-term memory to learn probability distributions over states of HMM. The proposed method is validated on multiple challenging datasets. Experiments demonstrate that our approach can effectively work on a wide range of gestures, and achieve state-of-the-art performance.
时间动态性是人体手势建模中的一个开放性问题。一种解决方案是采用生成模型,如隐马尔可夫模型(HMM)。然而,大多数工作都假设每个隐藏状态有固定的锚点,这使得难以描述手势明确的时间结构。基于手势是具有明确界定阶段的时间序列这一观察结果,我们提出一种新的公式,通过低秩矩阵分解来构建手势的时间组合。唯一的假设是,具有静态姿势的手势“保持”阶段彼此之间存在线性相关性。这样,一个手势序列可以被分割成具有语义意义和区分性概念的时间状态。此外,与传统的HMM不同,传统HMM倾向于使用特定的距离度量进行聚类,并且在估计发射概率时忽略时间上下文信息,我们利用长短期记忆来学习HMM状态上的概率分布。所提出的方法在多个具有挑战性的数据集上得到了验证。实验表明,我们的方法可以有效地处理各种手势,并实现了当前最优的性能。