用于动作识别的骨骼时空与动态信息深度融合

Deep Fusion of Skeleton Spatial-Temporal and Dynamic Information for Action Recognition.

作者信息

Gao Song, Zhang Dingzhuo, Tang Zhaoming, Wang Hongyan

机构信息

Aviation Maintenance NCO Academy, Air Force Engineering University, Xinyang 464007, China.

College of Information Engineering, Dalian University, Dalian 116622, China.

出版信息

Sensors (Basel). 2024 Nov 28;24(23):7609. doi: 10.3390/s24237609.

DOI:10.3390/s24237609

PMID:39686146

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11645088/

Abstract

Focusing on the issue of the low recognition rates achieved by traditional deep-information-based action recognition algorithms, an action recognition approach was developed based on skeleton spatial-temporal and dynamic features combined with a two-stream convolutional neural network (TS-CNN). Firstly, the skeleton's three-dimensional coordinate system was transformed to obtain coordinate information related to relative joint positions. Subsequently, this relevant joint information was encoded as a color texture map to construct the spatial-temporal feature descriptor of the skeleton. Furthermore, physical structure constraints of the human body were considered to enhance class differences. Additionally, the speed information for each joint was estimated and encoded as a color texture map to achieve the skeleton motion feature descriptor. The resulting spatial-temporal and dynamic features were further enhanced using motion saliency and morphology operators to improve their expression ability. Finally, these enhanced skeleton spatial-temporal and dynamic features were deeply fused via TS-CNN for implementing action recognition. Numerous results from experiments conducted on the publicly available datasets NTU RGB-D, Northwestern-UCLA, and UTD-MHAD demonstrate that the recognition rates achieved via the developed approach are 86.25%, 87.37%, and 93.75%, respectively, indicating that the approach can effectively improve the accuracy of action recognition in complex environments compared to state-of-the-art algorithms.

摘要

针对传统基于深度信息的动作识别算法识别率较低的问题，开发了一种基于骨骼时空和动态特征并结合双流卷积神经网络（TS-CNN）的动作识别方法。首先，对骨骼的三维坐标系进行变换，以获取与相对关节位置相关的坐标信息。随后，将此相关关节信息编码为彩色纹理图，以构建骨骼的时空特征描述符。此外，考虑人体的物理结构约束以增强类别差异。另外，估计每个关节的速度信息并将其编码为彩色纹理图，以实现骨骼运动特征描述符。使用运动显著性和形态学算子进一步增强所得的时空和动态特征，以提高其表达能力。最后，通过TS-CNN对这些增强的骨骼时空和动态特征进行深度融合，以实现动作识别。在公开可用数据集NTU RGB-D、西北大学UCLA和UTD-MHAD上进行的大量实验结果表明，通过所开发方法实现的识别率分别为86.25%、87.37%和93.75%，这表明与现有最先进算法相比，该方法能够有效提高复杂环境下动作识别的准确率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于动作识别的骨骼时空与动态信息深度融合

Deep Fusion of Skeleton Spatial-Temporal and Dynamic Information for Action Recognition.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

用于动作识别的骨骼时空与动态信息深度融合

Deep Fusion of Skeleton Spatial-Temporal and Dynamic Information for Action Recognition.

作者信息

机构信息

出版信息

相似文献

本文引用的文献