IEEE Trans Image Process. 2018 Sep;27(9):4382-4394. doi: 10.1109/TIP.2018.2837386.
Recently, skeleton-based action recognition becomes popular owing to the development of cost-effective depth sensors and fast pose estimation algorithms. Traditional methods based on pose descriptors often fail on large-scale datasets due to the limited representation of engineered features. Recent recurrent neural networks (RNN) based approaches mostly focus on the temporal evolution of body joints and neglect the geometric relations. In this paper, we aim to leverage the geometric relations among joints for action recognition. We introduce three primitive geometries: joints, edges, and surfaces. Accordingly, a generic end-to-end RNN based network is designed to accommodate the three inputs. For action recognition, a novel viewpoint transformation layer and temporal dropout layers are utilized in the RNN based network to learn robust representations. And for action detection, we first perform frame-wise action classification, then exploit a novel multi-scale sliding window algorithm. Experiments on the large-scale 3D action recognition benchmark datasets show that joints, edges, and surfaces are effective and complementary for different actions. Our approaches dramatically outperform the existing state-of-the-art methods for both tasks of action recognition and action detection.
最近,基于骨架的动作识别技术由于成本效益高的深度传感器和快速姿态估计算法的发展而变得流行。传统的基于姿态描述符的方法由于工程特征的表示有限,在大规模数据集上往往会失败。最近基于循环神经网络(RNN)的方法主要关注于身体关节的时间演化,而忽略了几何关系。在本文中,我们旨在利用关节之间的几何关系进行动作识别。我们引入了三种基本几何形状:关节、边和表面。相应地,设计了一个通用的端到端基于 RNN 的网络来容纳这三种输入。对于动作识别,在基于 RNN 的网络中利用了新颖的视角变换层和时间丢弃层来学习鲁棒的表示。对于动作检测,我们首先执行逐帧的动作分类,然后利用新颖的多尺度滑动窗口算法。在大规模 3D 动作识别基准数据集上的实验表明,关节、边和表面对于不同的动作是有效且互补的。我们的方法在动作识别和动作检测这两个任务上都明显优于现有的最先进方法。