Liu Yanan, Zhang Hao, Li Yanqiu, He Kangjian, Xu Dan
IEEE Trans Vis Comput Graph. 2023 May;29(5):2575-2585. doi: 10.1109/TVCG.2023.3247075. Epub 2023 Mar 29.
The skeleton-based human action recognition has broad application prospects in the field of virtual reality, as skeleton data is more resistant to data noise such as background interference and camera angle changes. Notably, recent works treat the human skeleton as a non-grid representation, e.g., skeleton graph, then learns the spatio-temporal pattern via graph convolution operators. Still, the stacked graph convolution plays a marginal role in modeling long-range dependences that may contain crucial action semantic cues. In this work, we introduce a skeleton large kernel attention operator (SLKA), which can enlarge the receptive field and improve channel adaptability without increasing too much computational burden. Then a spatiotemporal SLKA module (ST-SLKA) is integrated, which can aggregate long-range spatial features and learn long-distance temporal correlations. Further, we have designed a novel skeleton-based action recognition network architecture called the spatiotemporal large-kernel attention graph convolution network (LKA-GCN). In addition, large-movement frames may carry significant action information. This work proposes a joint movement modeling strategy (JMM) to focus on valuable temporal interactions. Ultimately, on the NTU-RGBD 60, NTU-RGBD 120 and Kinetics-Skeleton 400 action datasets, the performance of our LKA-GCN has achieved a state-of-the-art level.
基于骨骼的人体动作识别在虚拟现实领域具有广阔的应用前景,因为骨骼数据对诸如背景干扰和相机角度变化等数据噪声具有更强的抵抗力。值得注意的是,最近的研究将人体骨骼视为非网格表示,例如骨骼图,然后通过图卷积算子学习时空模式。然而,堆叠图卷积在对可能包含关键动作语义线索的长程依赖进行建模方面作用有限。在这项工作中,我们引入了一种骨骼大核注意力算子(SLKA),它可以在不增加过多计算负担的情况下扩大感受野并提高通道适应性。然后集成了一个时空SLKA模块(ST-SLKA),它可以聚合长程空间特征并学习长距离时间相关性。此外,我们设计了一种新颖的基于骨骼的动作识别网络架构,称为时空大核注意力图卷积网络(LKA-GCN)。此外,大幅度运动帧可能携带重要的动作信息。这项工作提出了一种联合运动建模策略(JMM)来关注有价值的时间交互。最终在NTU-RGBD 60、NTU-RGBD 120和Kinetics-Skeleton 400动作数据集上,我们的LKA-GCN性能达到了当前最优水平。