School of Electrical and Information Engineering,Tianjin University, Tianjin 300072, China.
Advanced Multimedia Research Lab, University of Wollongong, NSW 2522, Australia.
Sensors (Basel). 2020 Jun 10;20(11):3305. doi: 10.3390/s20113305.
The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ( M 2 I ) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).
本文提出了一种新颖的混合网络,用于从多种模态进行大规模动作识别。该网络建立在提出的加权动态图像之上。它有效地利用了新兴的卷积神经网络(CNN)和基于递归神经网络(RNN)的方法的优势,专门解决了在大规模动作识别中出现的挑战,而这些挑战无法被最先进的方法完全解决。具体来说,所提出的混合网络由基于 CNN 的组件和基于 RNN 的组件组成。两个组件提取的特征通过典型相关分析进行融合,然后馈送到线性支持向量机(SVM)进行分类。该网络在 ChaLearn LAP IsoGD、NTU RGB+D 和多模态和多视图和交互(M2I)数据集上实现了最先进的结果,并在某些情况下以较大的优势超过了现有方法(超过 10 个百分点)。