IEEE Trans Pattern Anal Mach Intell. 2014 Dec;36(12):2466-82. doi: 10.1109/TPAMI.2014.2329301.
In this paper, we address the problem of human action recognition through combining global temporal dynamics and local visual spatio-temporal appearance features. For this purpose, in the global temporal dimension, we propose to model the motion dynamics with robust linear dynamical systems (LDSs) and use the model parameters as motion descriptors. Since LDSs live in a non-Euclidean space and the descriptors are in non-vector form, we propose a shift invariant subspace angles based distance to measure the similarity between LDSs. In the local visual dimension, we construct curved spatio-temporal cuboids along the trajectories of densely sampled feature points and describe them using histograms of oriented gradients (HOG). The distance between motion sequences is computed with the Chi-Squared histogram distance in the bag-of-words framework. Finally we perform classification using the maximum margin distance learning method by combining the global dynamic distances and the local visual distances. We evaluate our approach for action recognition on five short clips data sets, namely Weizmann, KTH, UCF sports, Hollywood2 and UCF50, as well as three long continuous data sets, namely VIRAT, ADL and CRIM13. We show competitive results as compared with current state-of-the-art methods.
在本文中,我们通过结合全局时间动态和局部视觉时空外观特征来解决人体动作识别问题。为此,在全局时间维度上,我们提出用鲁棒线性动力系统(LDS)来建模运动动态,并将模型参数作为运动描述符。由于 LDS 存在于非欧几里得空间中,且描述符不是向量形式,因此我们提出了一种基于平移不变子空间角的距离来测量 LDS 之间的相似性。在局部视觉维度上,我们沿着密集采样特征点的轨迹构建弯曲的时空体,并使用方向梯度直方图(HOG)对其进行描述。在词袋框架中,通过卡方直方图距离计算运动序列之间的距离。最后,我们通过结合全局动态距离和局部视觉距离,使用最大间隔距离学习方法进行分类。我们在五个短片段数据集(即 Weizmann、KTH、UCF 运动、好莱坞 2 和 UCF50)以及三个长连续数据集(即 VIRAT、ADL 和 CRIM13)上评估了我们的动作识别方法。与当前最先进的方法相比,我们展示了具有竞争力的结果。