Suppr超能文献

学习用于基于骨架的动作识别的异构时空上下文

Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition.

作者信息

Gao Xuehao, Yang Yang, Wu Yang, Du Shaoyi

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):12130-12141. doi: 10.1109/TNNLS.2023.3252172. Epub 2024 Sep 3.

Abstract

Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.

摘要

图卷积网络(GCN)已被广泛应用于基于骨架的动作识别任务,并取得了丰硕的成果。在GCN中,节点交互建模主导着上下文聚合,因此对于基于图的卷积核提取代表性特征至关重要。在本文中,我们深入研究了一种强大的图卷积公式,以从这些基于骨架的图中捕捉丰富的运动模式。具体来说,我们提出了一种新颖的异构图卷积(HetGCN),它可以被视为(2 + 1)-D和3-D图卷积这两个极端之间的中间地带。HetGCN的核心观察结果是,多个信息流在3-D卷积核中共同交织,包括空间、时间和时空线索。由于空间和时间信息流表征了动作识别的不同线索,HetGCN首先动态分析每个节点与其跨时空邻居之间的成对交互,然后促进它们之间的异构上下文聚合。将HetGCN视为一种通用的卷积公式,我们进一步将其发展为两个具体的实例(即尺度内和尺度间HetGCN),这显著促进了骨架图上的跨时空和跨尺度学习。通过集成这些模块,我们提出了一个强大的人体动作识别系统,在NTU-60跨主体(X-Sub)基准测试中以93.1%的准确率、在NTU-120 X-Sub基准测试中以88.9%的准确率以及在动力学骨架上以38.4%的准确率超越了现有方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验