基于图像的时空表示和卷积神经网络的骨骼驱动动作识别。

Skeleton Driven Action Recognition Using an Image-Based Spatial-Temporal Representation and Convolution Neural Network.

机构信息

Centro Algoritmi, University of Minho, Campus of Azurém, 4800-058 Guimarães, Portugal.

DIBRIS, University of Genoa, 13-16145 Genoa, Italy.

出版信息

Sensors (Basel). 2021 Jun 25;21(13):4342. doi: 10.3390/s21134342.

DOI:10.3390/s21134342

PMID:34201991

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8271982/

Abstract

Individuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model's performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model's performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.

摘要

自闭症谱系障碍（ASD）患者通常在与同龄人交往和互动方面存在困难。因此，研究人员一直在开发不同的技术解决方案，作为 ASD 儿童的支持工具。社交机器人是这些技术解决方案之一，它们通常无法了解自己的游戏伙伴，无法自动调整自己的行为以适应用户。可以用来丰富这种互动并相应地调整系统行为的信息是使用 RGB 相机和/或深度传感器识别用户的不同动作。本工作提出了一种方法，通过使用 Intel RealSense 和 Nuitrack SDK 来检测和提取用户关节坐标，实时自动检测 ASD 儿童的典型和刻板动作。该流水线首先将关节的时空动态映射到基于颜色图像的表示上。通常，关节在最终图像中的位置会聚类成组。为了验证最终图像表示中的关节序列是否会影响模型的性能，进行了两个主要实验，在第一个实验中，改变了序列中分组关节的顺序，在第二个实验中，随机排列了关节。在每个实验中，都使用统计方法进行分析。基于进行的实验，发现图像中关节的顺序存在统计学上的显著差异，表明关节的顺序可能会影响模型的性能。最终的模型是一个卷积神经网络（CNN），它在不同的动作（典型和刻板）上进行训练，用于对不同的行为模式进行分类，在测试数据上的平均准确率为 92.4%±0.0%。整个流水线的平均帧率为 31 FPS。