School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
Key Laboratory of Industrial Internet and Big Data, China National Light Industry, Beijing, 100048, China.
Sci Rep. 2022 Mar 8;12(1):4111. doi: 10.1038/s41598-022-08157-5.
There has been significant progress in skeleton-based action recognition. Human skeleton can be naturally structured into graph, so graph convolution networks have become the most popular method in this task. Most of these state-of-the-art methods optimized the structure of human skeleton graph to obtain better performance. Based on these advanced algorithms, a simple but strong network is proposed with three major contributions. Firstly, inspired by some adaptive graph convolution networks and non-local blocks, some kinds of self-attention modules are designed to exploit spatial and temporal dependencies and dynamically optimize the graph structure. Secondly, a light but efficient architecture of network is designed for skeleton-based action recognition. Moreover, a trick is proposed to enrich the skeleton data with bones connection information and make obvious improvement to the performance. The method achieves 90.5% accuracy on cross-subjects setting (NTU60), with 0.89M parameters and 0.32 GMACs of computation cost. This work is expected to inspire new ideas for the field.
基于骨架的动作识别技术已经取得了显著的进展。人类骨架可以自然地构造成图,因此图卷积网络已成为该任务中最流行的方法。这些最新方法中的大多数都优化了人体骨架图的结构,以获得更好的性能。基于这些先进的算法,我们提出了一种简单而强大的网络,主要有三个贡献。首先,受一些自适应图卷积网络和非局部块的启发,设计了几种自注意模块来挖掘空间和时间依赖性,并动态优化图结构。其次,为基于骨架的动作识别设计了一种轻量级但高效的网络架构。此外,还提出了一种利用骨骼连接信息丰富骨骼数据的技巧,从而显著提高了性能。该方法在跨主体设置(NTU60)上实现了 90.5%的准确率,参数量为 0.89M,计算成本为 0.32GMACs。这项工作有望为该领域带来新的思路。