HDPose：基于条件化的后分层扩散方法用于三维人体姿态估计

HDPose: Post-Hierarchical Diffusion with Conditioning for 3D Human Pose Estimation.

机构信息

Department of Information and Communications Engineering, Sejong University, Seoul 05006, Republic of Korea.

Department of Electrical Engineering, Sejong University, Seoul 05006, Republic of Korea.

出版信息

Sensors (Basel). 2024 Jan 26;24(3):829. doi: 10.3390/s24030829.

DOI:10.3390/s24030829

PMID:38339546

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10856994/

Abstract

Recently, monocular 3D human pose estimation (HPE) methods were used to accurately predict 3D pose by solving the ill-pose problem caused by 3D-2D projection. However, monocular 3D HPE still remains challenging owing to the inherent depth ambiguity and occlusions. To address this issue, previous studies have proposed diffusion model-based approaches (DDPM) that learn to reconstruct a correct 3D pose from a noisy initial 3D pose. In addition, these approaches use 2D keypoints or context encoders that encode spatial and temporal information to inform the model. However, they often fall short of achieving peak performance, or require an extended period to converge to the target pose. In this paper, we proposed HDPose, which can converge rapidly and predict 3D poses accurately. Our approach aggregated spatial and temporal information from the condition into a denoising model in a hierarchical structure. We observed that the post-hierarchical structure achieved the best performance among various condition structures. Further, we evaluated our model on the widely used Human3.6M and MPI-INF-3DHP datasets. The proposed model demonstrated competitive performance with state-of-the-art models, achieving high accuracy with faster convergence while being considerably more lightweight.

摘要

最近，基于单目 3D 人体姿态估计 (HPE) 的方法通过解决 3D-2D 投影引起的病态问题，从而准确预测 3D 姿态。然而，由于固有的深度歧义性和遮挡问题，单目 3D HPE 仍然具有挑战性。为了解决这个问题，先前的研究提出了基于扩散模型的方法 (DDPM)，这些方法旨在从初始的噪声 3D 姿态中学习重建正确的 3D 姿态。此外，这些方法还使用 2D 关键点或上下文编码器来编码空间和时间信息，以告知模型。然而，它们往往无法达到最佳性能，或者需要很长时间才能收敛到目标姿态。在本文中，我们提出了 HDPose，它可以快速收敛并准确预测 3D 姿态。我们的方法在分层结构中将条件中的空间和时间信息聚合到去噪模型中。我们观察到，在各种条件结构中，后分层结构的性能最好。此外，我们在广泛使用的 Human3.6M 和 MPI-INF-3DHP 数据集上评估了我们的模型。所提出的模型在与最先进的模型的比较中表现出了有竞争力的性能，在实现更高精度的同时，收敛速度更快，并且模型更加轻量。