IEEE Trans Pattern Anal Mach Intell. 2022 Jun;44(6):3316-3333. doi: 10.1109/TPAMI.2021.3053765. Epub 2022 May 5.
3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; and 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multiscale graph convolution networks to extract spatial and temporal features. The multiscale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.
基于 3D 骨架的动作识别和运动预测是人体活动理解的两个基本问题。在许多先前的工作中:1)它们分别研究了两个任务,忽略了内部相关性;2)它们没有捕捉到身体内部的充分关系。为了解决这些问题,我们提出了一种共生模型来联合处理两个任务;并提出了两种图尺度来显式捕捉身体关节和身体部位之间的关系。我们共同提出了共生图神经网络,它包含一个主干、一个动作识别头和一个运动预测头。两个头一起训练并相互增强。对于主干,我们提出了多分支多尺度图卷积网络来提取空间和时间特征。多尺度图卷积网络基于关节尺度图和部位尺度图。关节尺度图包含动作图,捕捉基于动作的关系,以及结构图,捕捉物理约束。部位尺度图将身体关节集成在一起形成特定部位,代表高级关系。此外,还提出了双骨骼图和网络来学习互补特征。我们在四个数据集(NTU-RGB+D、Kinetics、Human3.6M 和 CMU Mocap)上进行了基于骨架的动作识别和运动预测的广泛实验。实验表明,与最先进的方法相比,我们的共生图神经网络在两个任务上都取得了更好的性能。