Liu Jun, Shahroudy Amir, Xu Dong, Kot Alex C, Wang Gang
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):3007-3021. doi: 10.1109/TPAMI.2017.2771306. Epub 2017 Nov 9.
Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the skeletal data. The proposed work extends this idea to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. Moreover, we introduce a novel multi-modal feature fusion strategy within the LSTM unit in this paper. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method.
在过去几年中,基于骨骼的人体动作识别吸引了大量研究关注。近期的工作尝试利用循环神经网络对人体关节三维位置配置之间的时间依赖性进行建模,以便更好地分析骨骼数据中的人类活动。所提出的工作将这一思想扩展到空间域和时间域,以同时更好地分析这两个域中人体骨骼序列内与动作相关信息的隐藏来源。基于Kinect骨骼数据的图像结构,还提出了一种有效的基于树结构的遍历框架。为了处理骨骼数据中的噪声,在LSTM模块中引入了一种新的门控机制,通过该机制网络可以学习序列数据的可靠性,并相应地调整输入数据对存储在单元记忆单元中的长期上下文表示更新过程的影响。此外,本文在LSTM单元中引入了一种新颖的多模态特征融合策略。在七个具有挑战性的人体动作识别基准数据集上的综合实验结果证明了所提方法的有效性。