Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania.
Sensors (Basel). 2021 Mar 15;21(6):2051. doi: 10.3390/s21062051.
Action recognition plays an important role in various applications such as video monitoring, automatic video indexing, crowd analysis, human-machine interaction, smart homes and personal assistive robotics. In this paper, we propose improvements to some methods for human action recognition from videos that work with data represented in the form of skeleton poses. These methods are based on the most widely used techniques for this problem-Graph Convolutional Networks (GCNs), Temporal Convolutional Networks (TCNs) and Recurrent Neural Networks (RNNs). Initially, the paper explores and compares different ways to extract the most relevant spatial and temporal characteristics for a sequence of frames describing an action. Based on this comparative analysis, we show how a TCN type unit can be extended to work even on the characteristics extracted from the spatial domain. To validate our approach, we test it against a benchmark often used for human action recognition problems and we show that our solution obtains comparable results to the state-of-the-art, but with a significant increase in the inference speed.
动作识别在视频监控、自动视频索引、人群分析、人机交互、智能家居和个人辅助机器人等各种应用中起着重要作用。在本文中,我们对一些基于骨架姿势表示形式的数据的视频人体动作识别方法进行了改进。这些方法基于解决此问题最广泛使用的技术——图卷积网络 (GCN)、时间卷积网络 (TCN) 和递归神经网络 (RNN)。最初,本文探讨并比较了从描述动作的一系列帧中提取最相关的空间和时间特征的不同方法。基于此比较分析,我们展示了如何扩展 TCN 类型单元,使其甚至可以处理从空间域提取的特征。为了验证我们的方法,我们将其与常用于人体动作识别问题的基准进行了测试,并表明我们的解决方案可以获得与最先进技术相当的结果,但推理速度有了显著提高。