Shi Lei, Zhang Yifan, Cheng Jian, Lu Hanqing
IEEE Trans Image Process. 2020 Oct 9;PP. doi: 10.1109/TIP.2020.3028207.
Graph convolutional networks (GCNs), which generalize CNNs to more generic non-Euclidean structures, have achieved remarkable performance for skeleton-based action recognition. However, there still exist several issues in the previous GCN-based models. First, the topology of the graph is set heuristically and fixed over all the model layers and input data. This may not be suitable for the hierarchy of the GCN model and the diversity of the data in action recognition tasks. Second, the second-order information of the skeleton data, i.e., the length and orientation of the bones, is rarely investigated, which is naturally more informative and discriminative for the human action recognition. In this work, we propose a novel multi-stream attention-enhanced adaptive graph convolutional neural network (MS-AAGCN) for skeleton-based action recognition. The graph topology in our model can be either uniformly or individually learned based on the input data in an end-to-end manner. This data-driven approach increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Besides, the proposed adaptive graph convolutional layer is further enhanced by a spatial-temporal-channel attention module, which helps the model pay more attention to important joints, frames and features. Moreover, the information of both the joints and bones, together with their motion information, are simultaneously modeled in a multi-stream framework, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
图卷积网络(GCN)将卷积神经网络推广到更通用的非欧几里得结构,在基于骨架的动作识别方面取得了显著性能。然而,基于GCN的先前模型仍然存在几个问题。首先,图的拓扑结构是通过启发式设置的,并且在所有模型层和输入数据上都是固定的。这可能不适用于GCN模型的层次结构以及动作识别任务中数据的多样性。其次,骨架数据的二阶信息,即骨骼的长度和方向,很少被研究,而这些信息对于人类动作识别自然更具信息量和判别力。在这项工作中,我们提出了一种用于基于骨架的动作识别的新型多流注意力增强自适应图卷积神经网络(MS-AAGCN)。我们模型中的图拓扑可以基于输入数据以端到端的方式统一或单独学习。这种数据驱动的方法增加了模型在图构建方面的灵活性,并带来了更强的通用性以适应各种数据样本。此外,所提出的自适应图卷积层通过时空通道注意力模块进一步增强,这有助于模型更加关注重要的关节、帧和特征。而且,关节和骨骼的信息及其运动信息在多流框架中同时建模,这在识别准确率方面显示出显著提高。在两个大规模数据集NTU-RGBD和Kinetics-Skeleton上进行的大量实验表明,我们模型的性能大幅超越了当前的最优水平。