IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14639-14652. doi: 10.1109/TPAMI.2023.3313311. Epub 2023 Nov 3.
Despite the impressive results achieved by deep learning based 3D reconstruction, the techniques of directly learning to model 4D human captures with detailed geometry have been less studied. This work presents a novel neural compositional representation for Human 4D Modeling with transformER (H4MER). Specifically, our H4MER is a compact and compositional representation for dynamic human by exploiting the human body prior from the widely used SMPL parametric model. Thus, H4MER can represent a dynamic 3D human over a temporal span with the codes of shape, initial pose, motion and auxiliaries. A simple yet effective linear motion model is proposed to provide a rough and regularized motion estimation, followed by per-frame compensation for pose and geometry details with the residual encoded in the auxiliary codes. We present a novel Transformer-based feature extractor and conditional GRU decoder to facilitate learning and improve the representation capability. Extensive experiments demonstrate our method is not only effective in recovering dynamic human with accurate motion and detailed geometry, but also amenable to various 4D human related tasks, including monocular video fitting, motion retargeting, 4D completion, and future prediction.
尽管基于深度学习的 3D 重建技术取得了令人印象深刻的成果,但直接学习用详细几何模型来建模 4D 人体捕捉的技术研究较少。本工作提出了一种新颖的神经组合表示方法,用于人类 4D 建模,称为 transformER(H4MER)。具体来说,我们的 H4MER 是一种紧凑而组合的动态人体表示方法,利用了广泛使用的 SMPL 参数模型中的人体先验。因此,H4MER 可以用形状、初始姿势、运动和辅助代码的代码来表示跨越时间的动态 3D 人体。我们提出了一种简单而有效的线性运动模型,用于提供粗略的正则化运动估计,然后通过残差在辅助代码中对每一帧的姿势和几何细节进行补偿。我们提出了一种基于 Transformer 的特征提取器和条件 GRU 解码器,以方便学习和提高表示能力。广泛的实验表明,我们的方法不仅可以有效地恢复具有准确运动和详细几何的动态人体,而且还适用于各种与 4D 人体相关的任务,包括单目视频拟合、运动重定向、4D 完成和未来预测。