IEEE Trans Image Process. 2021;30:2263-2275. doi: 10.1109/TIP.2021.3051495. Epub 2021 Jan 26.
Recently, skeleton-based human action recognition has attracted a lot of research attention in the field of computer vision. Graph convolutional networks (GCNs), which model the human body skeletons as spatial-temporal graphs, have shown excellent results. However, the existing methods only focus on the local physical connection between the joints, and ignore the non-physical dependencies among joints. To address this issue, we propose a hypergraph neural network (Hyper-GNN) to capture both spatial-temporal information and high-order dependencies for skeleton-based action recognition. In particular, to overcome the influence of noise caused by unrelated joints, we design the Hyper-GNN to extract the local and global structure information via the hyperedge (i.e., non-physical connection) constructions. In addition, the hypergraph attention mechanism and improved residual module are induced to further obtain the discriminative feature representations. Finally, a three-stream Hyper-GNN fusion architecture is adopted in the whole framework for action recognition. The experimental results performed on two benchmark datasets demonstrate that our proposed method can achieve the best performance when compared with the state-of-the-art skeleton-based methods.
最近,基于骨架的人体动作识别在计算机视觉领域引起了广泛关注。将人体骨架建模为时空图的图卷积网络(GCN)已经显示出了优异的效果。然而,现有的方法仅关注关节之间的局部物理连接,而忽略了关节之间的非物理依赖关系。针对这个问题,我们提出了一种超图神经网络(Hyper-GNN),用于捕捉基于骨架的动作识别中的时空信息和高阶依赖关系。具体来说,为了克服由于不相关关节引起的噪声的影响,我们设计了 Hyper-GNN 通过超边(即非物理连接)构建来提取局部和全局结构信息。此外,引入了超图注意力机制和改进的残差模块,以进一步获得判别特征表示。最后,在整个框架中采用了三流 Hyper-GNN 融合架构进行动作识别。在两个基准数据集上的实验结果表明,与基于骨架的最先进方法相比,我们提出的方法可以取得最佳性能。