Wu Zhize, Sun Pengpeng, Chen Xin, Tang Keke, Xu Tong, Zou Le, Wang Xiaofeng, Tan Ming, Cheng Fan, Weise Thomas
IEEE Trans Image Process. 2024;33:4391-4403. doi: 10.1109/TIP.2024.3433581. Epub 2024 Aug 5.
Graph Convolutional Networks (GCNs) are widely used for skeleton-based action recognition and achieved remarkable performance. Due to the locality of graph convolution, GCNs can only utilize short-range node dependencies but fail to model long-range node relationships. In addition, existing graph convolution based methods normally use a uniform skeleton topology for all frames, which limits the ability of feature learning. To address these issues, we present the Graph Convolution Network with Self-Attention (SelfGCN), which consists of a mixing features across self-attention and graph convolution (MFSG) module and a temporal-specific spatial self-attention (TSSA) module. The MFSG module models local and global relationships between joints by executing graph convolution and self-attention branches in parallel. Its bi-directional interactive learning strategy utilizes complementary clues in the channel dimensions and the spatial dimensions across both of these branches. The TSSA module uses self-attention to learn the spatial relationships between joints of each frame in a skeleton sequence. It also models the unique spatial features of the single frames. We conduct extensive experiments on three popular benchmark datasets, NTU RGB+D, NTU RGB+D120, and Northwestern-UCLA. The results of the experiment demonstrate that our method achieves or exceeds the record accuracies on all three benchmarks. Our project website is available at https://github.com/SunPengP/SelfGCN.
图卷积网络(GCN)被广泛用于基于骨架的动作识别,并取得了显著的性能。由于图卷积的局部性,GCN只能利用短程节点依赖关系,而无法对长程节点关系进行建模。此外,现有的基于图卷积的方法通常对所有帧使用统一的骨架拓扑结构,这限制了特征学习的能力。为了解决这些问题,我们提出了带有自注意力机制的图卷积网络(SelfGCN),它由跨自注意力和图卷积混合特征(MFSG)模块和特定于时间的空间自注意力(TSSA)模块组成。MFSG模块通过并行执行图卷积和自注意力分支来建模关节之间的局部和全局关系。其双向交互学习策略利用了这两个分支在通道维度和空间维度上的互补线索。TSSA模块使用自注意力来学习骨架序列中每一帧关节之间的空间关系。它还对单帧的独特空间特征进行建模。我们在三个流行的基准数据集NTU RGB+D、NTU RGB+D120和西北大学UCLA上进行了广泛的实验。实验结果表明,我们的方法在所有三个基准上都达到或超过了记录精度。我们的项目网站可在https://github.com/SunPengP/SelfGCN上获取。