Fu Jiajun, Yang Fuxing, Dang Yonghao, Liu Xiaoli, Yin Jianqin
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14273-14287. doi: 10.1109/TNNLS.2023.3277476. Epub 2024 Oct 7.
Human motion prediction is challenging due to the complex spatiotemporal feature modeling. Among all methods, graph convolution networks (GCNs) are extensively utilized because of their superiority in explicit connection modeling. Within a GCN, the graph correlation adjacency matrix drives feature aggregation, and thus, is the key to extracting predictive motion features. State-of-the-art methods decompose the spatiotemporal correlation into spatial correlations for each frame and temporal correlations for each joint. Directly parameterizing these correlations introduces redundant parameters to represent common relations shared by all frames and all joints. Besides, the spatiotemporal graph adjacency matrix is the same for different motion samples, and thus, cannot reflect samplewise correspondence variances. To overcome these two bottlenecks, we propose dynamic spatiotemporal decompose GC (DSTD-GC), which only takes 28.6% parameters of the state-of-the-art GC. The key of DSTD-GC is constrained dynamic correlation modeling, which explicitly parameterizes the common static constraints as a spatial/temporal vanilla adjacency matrix shared by all frames/joints and dynamically extracts correspondence variances for each frame/joint with an adjustment modeling function. For each sample, the common constrained adjacency matrices are fixed to represent generic motion patterns, while the extracted variances complete the matrices with specific pattern adjustments. Meanwhile, we mathematically reformulate GCs on spatiotemporal graphs into a unified form and find that DSTD-GC relaxes certain constraints of other GC, which contributes to a better representation capability. Moreover, by combining DSTD-GC with prior knowledge like body connection and temporal context, we propose a powerful spatiotemporal GCN called DSTD-GCN. On the Human3.6M, Carnegie Mellon University (CMU) Mocap, and 3D Poses in the Wild (3DPW) datasets, DSTD-GCN outperforms state-of-the-art methods by 3.9%-8.7% in prediction accuracy with 55.0%-96.9% fewer parameters. Codes are available at https://github.com/Jaakk0F/DSTD-GCN.
由于复杂的时空特征建模,人体运动预测具有挑战性。在所有方法中,图卷积网络(GCN)因其在显式连接建模方面的优势而被广泛使用。在GCN中,图相关邻接矩阵驱动特征聚合,因此是提取预测运动特征的关键。现有方法将时空相关性分解为每个帧的空间相关性和每个关节的时间相关性。直接对这些相关性进行参数化会引入冗余参数来表示所有帧和所有关节共享的共同关系。此外,时空图邻接矩阵对于不同的运动样本是相同的,因此不能反映样本间的对应差异。为了克服这两个瓶颈,我们提出了动态时空分解GC(DSTD-GC),它只采用了现有技术GC的28.6%的参数。DSTD-GC的关键是约束动态相关性建模,它将常见的静态约束明确参数化为所有帧/关节共享的空间/时间普通邻接矩阵,并通过调整建模函数为每个帧/关节动态提取对应差异。对于每个样本,常见的约束邻接矩阵被固定以表示通用运动模式,而提取的差异则通过特定模式调整来完善矩阵。同时,我们在数学上把时空图上的GCN重新表述为统一形式,并发现DSTD-GC放宽了其他GC的某些约束,这有助于提高表示能力。此外,通过将DSTD-GC与身体连接和时间上下文等先验知识相结合,我们提出了一种强大的时空GCN,称为DSTD-GCN。在Human3.6M、卡内基梅隆大学(CMU)动作捕捉和野外3D姿态(3DPW)数据集上,DSTD-GCN在预测准确率上比现有技术方法高出3.9%-8.7%,参数减少了55.0%-96.9%。代码可在https://github.com/Jaakk0F/DSTD-GCN获取。