School of Electronics and Information Engineering, Korea Aerospace University, Goyang 10540, Republic of Korea.
Sensors (Basel). 2023 Jan 10;23(2):778. doi: 10.3390/s23020778.
To provide accessible, intelligent, and efficient remote access such as the internet of things, rehabilitation, autonomous driving, virtual games, and healthcare, human action recognition (HAR) has gained much attention among computer vision researchers. Several methods have already been addressed to ensure effective and efficient action recognition based on different perspectives including data modalities, feature design, network configuration, and application domains. In this article, we design a new deep learning model by integrating criss-cross attention and edge convolution to extract discriminative features from the skeleton sequence for action recognition. The attention mechanism is applied in spatial and temporal directions to pursue the intra- and inter-frame relationships. Then, several edge convolutional layers are conducted to explore the geometric relationships among the neighboring joints in the human body. The proposed model is dynamically updated after each layer by recomputing the graph on the basis of k-nearest joints for learning local and global information in action sequences. We used publicly available benchmark skeleton datasets such as UTD-MHAD (University of Texas at Dallas multimodal human action dataset) and MSR-Action3D (Microsoft action 3D) to evaluate the proposed method. We also investigated the proposed method with different configurations of network architectures to assure effectiveness and robustness. The proposed method achieved average accuracies of 99.53% and 95.64% on the UTD-MHAD and MSR-Action3D datasets, respectively, outperforming state-of-the-art methods.
为了提供物联网、康复、自动驾驶、虚拟游戏和医疗保健等方面的可访问、智能和高效的远程访问,人体动作识别 (HAR) 在计算机视觉研究人员中引起了广泛关注。已经提出了几种方法,以确保基于不同视角(包括数据模态、特征设计、网络配置和应用领域)的有效和高效的动作识别。在本文中,我们设计了一种新的深度学习模型,通过集成交叉注意和边缘卷积从骨架序列中提取判别特征,用于动作识别。注意力机制应用于空间和时间方向,以研究帧内和帧间关系。然后,进行了几个边缘卷积层,以探索人体中相邻关节之间的几何关系。该模型通过在基于 k-最近邻关节的基础上重新计算图,在动作序列中学习局部和全局信息,从而在每层后动态更新。我们使用了公开可用的骨架基准数据集,如 UTD-MHAD(达拉斯大学多模态人体动作数据集)和 MSR-Action3D(微软动作 3D),来评估所提出的方法。我们还研究了具有不同网络架构配置的所提出的方法,以确保有效性和鲁棒性。所提出的方法在 UTD-MHAD 和 MSR-Action3D 数据集上的平均准确率分别为 99.53%和 95.64%,优于最先进的方法。