IEEE Trans Pattern Anal Mach Intell. 2018 May;40(5):1045-1058. doi: 10.1109/TPAMI.2017.2691321. Epub 2017 Apr 5.
Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets.
最近,人们广泛研究了基于 RGB 或深度序列的单一模式动作识别。人们普遍认为,这两种模式对于动作识别任务都具有不同的优势和局限性。因此,对 RGB+D 视频的分析可以帮助我们更好地研究这两种模式的互补特性,并实现更高的性能水平。在本文中,我们提出了一种新的基于深度自动编码器的共享-特定特征分解网络,将输入多模态信号分解为层次化的组件。进一步地,基于特征的结构,提出了一种结构稀疏学习机,该学习机利用混合范数在组件内和组件间进行正则化,以获得更好的分类性能。我们的实验结果表明,通过在五个具有挑战性的基准数据集上实现最先进的动作分类精度,我们的跨模态特征分析框架是有效的。