Gao Xuehao, Yang Yang, Wu Yang, Du Shaoyi
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12130-12141. doi: 10.1109/TNNLS.2023.3252172. Epub 2024 Sep 3.
Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.
图卷积网络(GCN)已被广泛应用于基于骨架的动作识别任务,并取得了丰硕的成果。在GCN中,节点交互建模主导着上下文聚合,因此对于基于图的卷积核提取代表性特征至关重要。在本文中,我们深入研究了一种强大的图卷积公式,以从这些基于骨架的图中捕捉丰富的运动模式。具体来说,我们提出了一种新颖的异构图卷积(HetGCN),它可以被视为(2 + 1)-D和3-D图卷积这两个极端之间的中间地带。HetGCN的核心观察结果是,多个信息流在3-D卷积核中共同交织,包括空间、时间和时空线索。由于空间和时间信息流表征了动作识别的不同线索,HetGCN首先动态分析每个节点与其跨时空邻居之间的成对交互,然后促进它们之间的异构上下文聚合。将HetGCN视为一种通用的卷积公式,我们进一步将其发展为两个具体的实例(即尺度内和尺度间HetGCN),这显著促进了骨架图上的跨时空和跨尺度学习。通过集成这些模块,我们提出了一个强大的人体动作识别系统,在NTU-60跨主体(X-Sub)基准测试中以93.1%的准确率、在NTU-120 X-Sub基准测试中以88.9%的准确率以及在动力学骨架上以38.4%的准确率超越了现有方法。