College of Information Engineering, Capital Normal University, Beijing 100048, China.
Sensors (Basel). 2024 Jul 8;24(13):4422. doi: 10.3390/s24134422.
Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.
三维人体姿态估计主要关注从二维视频生成三维姿态序列。它在人机交互、遥感、虚拟现实和计算机视觉等领域具有巨大的潜力。现有的优秀方法主要侧重于探索空间或时间编码以实现三维姿态推断。然而,各种架构利用空间和时间线索对三维姿态估计的独立影响,而忽略了空间-时间协同影响。为了解决这个问题,本文提出了一种具有双重自适应时空former(DASTFormer)和附加监督训练的新型三维姿态估计方法。DASTFormer 包含注意力自适应(AtA)和纯自适应(PuA)模式,通过自适应学习空间-时间效应来增强从二维到三维的姿态推断,同时考虑它们的协同和独立影响。此外,本文还提出了一种带有批方差损失的附加监督训练。与常见的训练策略不同,同一批数据上进行两轮参数更新。它不仅可以更好地探索空间-时间编码和三维姿态之间的潜在关系,还可以减轻图形卡对基于转换器框架的批大小限制。广泛的实验结果表明,所提出的方法在 Human3.6 和 HumanEVA 数据集上明显优于大多数最先进的方法。