School of Computer Science, Hangzhou Dianzi University, Hangzhou 310000, China.
Comput Intell Neurosci. 2021 Aug 18;2021:1507770. doi: 10.1155/2021/1507770. eCollection 2021.
Skeleton-based human action recognition has attracted much attention in the field of computer vision. Most of the previous studies are based on fixed skeleton graphs so that only the local physical dependencies among joints can be captured, resulting in the omission of implicit joint correlations. In addition, under different views, the content of the same action is very different. In some views, keypoints will be blocked, which will cause recognition errors. In this paper, an action recognition method based on distance vector and multihigh view adaptive network (DV-MHNet) is proposed to address this challenging task. Among the mentioned techniques, the multihigh (MH) view adaptive networks are constructed to automatically determine the best observation view at different heights, obtain complete keypoints information of the current frame image, and enhance the robustness and generalization of the model to recognize actions at different heights. Then, the distance vector (DV) mechanism is introduced on this basis to establish the relative distance and relative orientation between different keypoints in the same frame and the same keypoints in different frame to obtain the global potential relationship of each keypoint, and finally by constructing the spatial temporal graph convolutional network to take into account the information in space and time, the characteristics of the action are learned. This paper has done the ablation study with traditional spatial temporal graph convolutional networks and with or without multihigh view adaptive networks, which reasonably proves the effectiveness of the model. The model is evaluated on two widely used action recognition benchmarks (NTU-RGB + D and PKU-MMD). Our method achieves better performance on both datasets.
基于骨架的人体动作识别在计算机视觉领域引起了广泛关注。之前的大多数研究都是基于固定的骨架图,因此只能捕捉到关节之间的局部物理依赖关系,从而忽略了隐含的关节相关性。此外,在不同视角下,同一动作的内容差异很大。在某些视角下,关键点会被遮挡,从而导致识别错误。本文提出了一种基于距离向量和多高视角自适应网络(DV-MHNet)的动作识别方法,以解决这一具有挑战性的任务。在提到的技术中,构建了多高(MH)视角自适应网络,以自动确定不同高度的最佳观察视角,获取当前帧图像的完整关键点信息,并增强模型在不同高度识别动作的鲁棒性和泛化能力。然后,在此基础上引入距离向量(DV)机制,建立同一帧内不同关键点和不同帧内同一关键点之间的相对距离和相对方向,获取每个关键点的全局潜在关系,最后通过构建时空图卷积网络来考虑空间和时间信息,学习动作的特征。本文通过与传统时空图卷积网络以及是否使用多高视角自适应网络进行了消融研究,合理地证明了模型的有效性。该模型在两个广泛使用的动作识别基准(NTU-RGB+D 和 PKU-MMD)上进行了评估。我们的方法在两个数据集上都取得了更好的性能。