Department of Electrical Engineering, National Taipei University of Technology, Taipei 10608, Taiwan.
Department of Computer Science and Information Engineering, Chaoyang University of Technology, Taichung 413310, Taiwan.
Sensors (Basel). 2021 Apr 29;21(9):3112. doi: 10.3390/s21093112.
Research on the human activity recognition could be utilized for the monitoring of elderly people living alone to reduce the cost of home care. Video sensors can be easily deployed in the different zones of houses to achieve monitoring. The goal of this study is to employ a linear-map convolutional neural network (CNN) to perform action recognition with RGB videos. To reduce the amount of the training data, the posture information is represented by skeleton data extracted from the 300 frames of one film. The two-stream method was applied to increase the accuracy of recognition by using the spatial and motion features of skeleton sequences. The relations of adjacent skeletal joints were employed to build the direct acyclic graph (DAG) matrices, source matrix, and target matrix. Two features were transferred by DAG matrices and expanded as color texture images. The linear-map CNN had a two-dimensional linear map at the beginning of each layer to adjust the number of channels. A two-dimensional CNN was used to recognize the actions. We applied the RGB videos from the action recognition datasets of the NTU RGB+D database, which was established by the Rapid-Rich Object Search Lab, to execute model training and performance evaluation. The experimental results show that the obtained precision, recall, specificity, F1-score, and accuracy were 86.9%, 86.1%, 99.9%, 86.3%, and 99.5%, respectively, in the cross-subject source, and 94.8%, 94.7%, 99.9%, 94.7%, and 99.9%, respectively, in the cross-view source. An important contribution of this work is that by using the skeleton sequences to produce the spatial and motion features and the DAG matrix to enhance the relation of adjacent skeletal joints, the computation speed was faster than the traditional schemes that utilize single frame image convolution. Therefore, this work exhibits the practical potential of real-life action recognition.
人体活动识别的研究可用于监测独居老人,以降低家庭护理的成本。视频传感器可以很容易地部署在房屋的不同区域进行监测。本研究的目的是使用线性映射卷积神经网络(CNN)对 RGB 视频进行动作识别。为了减少训练数据的数量,采用骨架数据表示姿势信息,该骨架数据是从一部电影的 300 帧中提取出来的。采用双流法通过使用骨架序列的空间和运动特征来提高识别的准确性。相邻骨骼关节的关系被用来构建有向无环图(DAG)矩阵、源矩阵和目标矩阵。通过 DAG 矩阵传递了两个特征,并将其扩展为彩色纹理图像。线性映射 CNN 在每个层的开头有一个二维线性映射,用于调整通道数量。二维 CNN 用于识别动作。我们应用了由 Rapid-Rich Object Search Lab 建立的 NTU RGB+D 数据库的动作识别数据集的 RGB 视频来执行模型训练和性能评估。实验结果表明,在跨科目源中,得到的精度、召回率、特异性、F1 得分和准确性分别为 86.9%、86.1%、99.9%、86.3%和 99.5%,在跨视图源中,分别为 94.8%、94.7%、99.9%、94.7%和 99.9%。这项工作的一个重要贡献是,通过使用骨架序列生成空间和运动特征以及 DAG 矩阵来增强相邻骨骼关节的关系,计算速度比传统的利用单帧图像卷积的方案更快。因此,这项工作展示了实际生活中动作识别的实用潜力。