Pham Huy Hieu, Salmane Houssam, Khoudour Louahdi, Crouzil Alain, Zegers Pablo, Velastin Sergio A
Cerema, Project team STI, 1 avenue du Colonel Roche, F-31400 Toulouse, France.
Informatics Research Institute of Toulouse (IRIT), Paul Sabatier University, Toulouse 31062, France.
Sensors (Basel). 2019 Apr 24;19(8):1932. doi: 10.3390/s19081932.
Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio-temporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference.
从骨骼序列设计用于三维人体动作识别的运动表示是一项重要但具有挑战性的任务。一种有效的表示应该对噪声具有鲁棒性,对视角变化具有不变性,并且在低计算需求下能产生良好的性能。这项任务中的两个主要挑战包括如何有效地表示骨骼运动的时空模式,以及如何学习用于分类任务的判别特征。本文提出了一种新颖的基于骨骼的表示方法和一个使用RGB-D传感器进行三维动作识别的深度学习框架。我们建议构建一个名为SPMF()的动作地图,它是一种从骨骼姿态及其运动构建的紧凑图像表示。然后将自适应直方图均衡化(AHE)算法应用于SPMF以增强其局部模式并形成增强动作地图,即增强型SPMF。对于学习和分类任务,我们利用基于DenseNet架构的深度卷积神经网络,通过增强型SPMF直接学习输入骨骼序列与其动作标签之间的端到端映射。所提出的方法在四个具有挑战性的基准数据集上进行了评估,包括个体动作、交互、多视图和大规模数据集。实验结果表明,所提出的方法在所有基准任务上均优于先前的最先进方法,同时在训练和推理时所需的计算时间较短。