IEEE Trans Image Process. 2017 Dec;26(12):5560-5574. doi: 10.1109/TIP.2017.2740122. Epub 2017 Aug 14.
This paper presents a novel approach to action recognition using synthetic multi-view data from depth maps. Specifically, multiple views are first generated by rotating 3D point clouds from depth maps. A pyramid multi-view depth motion template is then adopted for multi-view action representation, characterizing the multi-scale motion and shape patterns in 3D. Empirically, despite the view-specific information, the latent information between multiple views often provides important cues for action recognition. Concentrating on this observation and motivated by the success of the dictionary learning framework, this paper proposes to explicitly learn a view-specific dictionary (called specificity) for each view, and simultaneously learn a latent dictionary (called latent correlation) across multiple views. Thus, a novel method, specificity and latent correlation learning, is put forward to learn the specificity that captures the most discriminative features of each view, and learn the latent correlation that contributes the inherent 3D information to multiple views. In this way, a compact and discriminative dictionary is constructed by specificity and latent correlation for feature representation of actions. The proposed method is evaluated on the MSR Action3D, the MSR Gesture3D, the MSR Action Pairs, and the ChaLearn multi-modal data sets, consistently achieving promising results compared with the state-of-the-art methods based on depth data.
本文提出了一种基于深度图合成多视角数据的动作识别新方法。具体来说,首先通过旋转 3D 点云生成多个视角。然后采用金字塔多视角深度运动模板进行多视角动作表示,以刻画 3D 中的多尺度运动和形状模式。从经验上看,尽管视角特定信息,但多个视角之间的潜在信息通常为动作识别提供重要线索。受字典学习框架成功的启发,本文专注于这一观察结果,提出为每个视角显式学习一个特定视角字典(称为特异性),并同时学习多个视角之间的潜在字典(称为潜在相关性)。因此,提出了一种新的特异性和潜在相关性学习方法,用于学习特异性,以捕获每个视角最具判别力的特征,以及学习潜在相关性,以将固有 3D 信息贡献给多个视角。通过这种方式,通过特异性和潜在相关性构建了一个紧凑且具有判别力的字典,用于动作特征表示。在 MSR Action3D、MSR Gesture3D、MSR Action Pairs 和 ChaLearn 多模态数据集上进行了评估,与基于深度数据的最新方法相比,该方法始终取得了有希望的结果。