He Zhiquan, Zhang Lujun, Wang Hengyou
Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen, China.
Guangdong Multimedia Information Service Engineering Technology Research Center, Shenzhen University, Shenzhen, China.
Front Comput Neurosci. 2023 Apr 5;17:1145209. doi: 10.3389/fncom.2023.1145209. eCollection 2023.
Human motion prediction is one of the fundamental studies of computer vision. Much work based on deep learning has shown impressive performance for it in recent years. However, long-term prediction and human skeletal deformation are still challenging tasks for human motion prediction. For accurate prediction, this paper proposes a GCN-based two-stage prediction method. We train a prediction model in the first stage. Using multiple cascaded spatial attention graph convolution layers (SAGCL) to extract features, the prediction model generates an initial motion sequence of future actions based on the observed pose. Since the initial pose generated in the first stage often deviates from natural human body motion, such as a motion sequence in which the length of a bone is changed. So the task of the second stage is to fine-tune the predicted pose and make it closer to natural motion. We present a fine-tuning model including multiple cascaded causally temporal-graph convolution layers (CT-GCL). We apply the spatial coordinate error of joints and bone length error as loss functions to train the fine-tuning model. We validate our model on Human3.6m and CMU-MoCap datasets. Extensive experiments show that the two-stage prediction method outperforms state-of-the-art methods. The limitations of proposed methods are discussed as well, hoping to make a breakthrough in future exploration.
人体运动预测是计算机视觉的基础研究之一。近年来,许多基于深度学习的工作在这方面展现出了令人印象深刻的性能。然而,长期预测和人体骨骼变形对于人体运动预测而言仍然是具有挑战性的任务。为了实现准确预测,本文提出了一种基于图卷积网络(GCN)的两阶段预测方法。我们在第一阶段训练一个预测模型。该预测模型使用多个级联的空间注意力图卷积层(SAGCL)来提取特征,并基于观察到的姿势生成未来动作的初始运动序列。由于在第一阶段生成的初始姿势往往偏离自然人体运动,例如出现骨骼长度发生变化的运动序列。所以第二阶段的任务是对预测姿势进行微调,使其更接近自然运动。我们提出了一个包含多个级联因果时间图卷积层(CT - GCL)的微调模型。我们将关节的空间坐标误差和骨骼长度误差作为损失函数来训练微调模型。我们在Human3.6m和CMU - MoCap数据集上对我们的模型进行了验证。大量实验表明,这种两阶段预测方法优于现有方法。同时也讨论了所提方法的局限性,希望在未来的探索中取得突破。