Gao Xiangjun, Yang Jiaolong, Kim Jongyoo, Peng Sida, Liu Zicheng, Tong Xin
IEEE Trans Pattern Anal Mach Intell. 2025 Aug;47(8):6110-6121. doi: 10.1109/TPAMI.2022.3205910.
There has been rapid progress recently on 3D human rendering, including novel view synthesis and pose animation, based on the advances of neural radiance fields (NeRF). However, most existing methods focus on person-specific training and their training typically requires multi-view videos. This article deals with a new challenging task - rendering novel views and novel poses for a person unseen in training, using only multiview still images as input without videos. For this task, we propose a simple yet surprisingly effective method to train a generalizable NeRF with multiview images as conditional input. The key ingredient is a dedicated representation combining a canonical NeRF and a volume deformation scheme. Using a canonical space enables our method to learn shared properties of human and easily generalize to different people. Volume deformation is used to connect the canonical space with input and target images and query image features for radiance and density prediction. We leverage the parametric 3D human model fitted on the input images to derive the deformation, which works quite well in practice when combined with our canonical NeRF. The experiments on both real and synthetic data with the novel view synthesis and pose animation tasks collectively demonstrate the efficacy of our method.
基于神经辐射场(NeRF)的进展,3D人体渲染(包括新视角合成和姿态动画)最近取得了快速进展。然而,大多数现有方法专注于特定人物训练,且其训练通常需要多视角视频。本文探讨了一项新的具有挑战性的任务——仅使用多视角静态图像作为输入(无需视频),为训练中未见过的人物渲染新视角和新姿态。对于此任务,我们提出了一种简单却出奇有效的方法,以多视角图像作为条件输入来训练可泛化的NeRF。关键要素是一种将规范NeRF和体积变形方案相结合的专用表示。使用规范空间使我们的方法能够学习人类的共享属性,并轻松推广到不同的人。体积变形用于将规范空间与输入图像和目标图像连接起来,并查询图像特征以进行辐射度和密度预测。我们利用拟合在输入图像上的参数化3D人体模型来推导变形,在与我们的规范NeRF结合使用时,该方法在实践中效果良好。在真实数据和合成数据上针对新视角合成和姿态动画任务进行的实验共同证明了我们方法的有效性。