Wang Kangkan, Wang Chong, Yang Jian, Zhang Guofeng
IEEE Trans Image Process. 2025;34:5200-5214. doi: 10.1109/TIP.2025.3592534.
Capturing the human body and clothing from videos has obtained significant progress in recent years, but several challenges remain to be addressed. Previous methods reconstruct the 3D bodies and garments from videos with self-rotating human motions or capture the body and clothing separately based on neural implicit fields. However, the reconstruction methods for self-rotating motions may cause instable tracking on dynamic videos with arbitrary human motions, while implicit fields based methods are limited to inefficient rendering and low quality synthesis. To solve these problems, we propose a new method, called CloCap-GS, for clothed human performance capture with 3D Gaussian Splatting. Specifically, we align 3D Gaussians with the deforming geometries of body and clothing, and leverage photometric constraints formed by matching Gaussians renderings with input video frames to recover temporal deformations of the dense template geometry. The geometry deformations and Gaussians properties of both the body and clothing are optimized jointly, achieving both dense geometry tracking and novel-view synthesis. In addition, we introduce a physics-aware material-varying cloth model to preserve physically-plausible cloth dynamics and body-clothing interactions that is pre-trained in a self-supervised manner without preparing training data. Compared with the existing methods, our method improves the accuracy of dense geometry tracking and quality of novel-view synthesis for a variety of daily garment types (e.g., loose clothes). Extensive experiments in both quantitative and qualitative evaluations demonstrate the effectiveness of CloCap-GS on real sparse-view or monocular videos.
近年来,从视频中捕捉人体和衣物已取得显著进展,但仍有一些挑战有待解决。以往的方法通过人体自旋转运动从视频中重建3D人体和服装,或者基于神经隐式场分别捕捉人体和衣物。然而,自旋转运动的重建方法可能会在具有任意人体运动的动态视频上导致不稳定的跟踪,而基于隐式场的方法则限于低效渲染和低质量合成。为了解决这些问题,我们提出了一种名为CloCap-GS的新方法,用于使用3D高斯点云进行着装人体表演捕捉。具体来说,我们将3D高斯点云与人体和衣物的变形几何形状对齐,并利用通过将高斯点云渲染与输入视频帧匹配而形成的光度约束来恢复密集模板几何形状的时间变形。人体和衣物的几何变形以及高斯点云属性被联合优化,实现了密集几何跟踪和新视角合成。此外,我们引入了一种物理感知的材质变化布料模型,以保持物理上合理的布料动态和人体与衣物的相互作用,该模型以自监督方式进行预训练,无需准备训练数据。与现有方法相比,我们的方法提高了各种日常服装类型(如宽松衣物)的密集几何跟踪精度和新视角合成质量。在定量和定性评估中的大量实验证明了CloCap-GS在真实稀疏视图或单目视频上的有效性。