Zhang Chenxu, Ni Saifeng, Fan Zhipeng, Li Hongbo, Zeng Ming, Budagavi Madhukar, Guo Xiaohu
IEEE Trans Vis Comput Graph. 2023 Feb;29(2):1438-1449. doi: 10.1109/TVCG.2021.3117484. Epub 2022 Dec 29.
Recently, we have witnessed a boom in applications for 3D talking face generation. However, most existing 3D face generation methods can only generate 3D faces with a static head pose, which is inconsistent with how humans perceive faces. Only a few articles focus on head pose generation, but even these ignore the attribute of personality. In this article, we propose a unified audio-driven approach to endow 3D talking faces with personalized pose dynamics. To achieve this goal, we establish an original person-specific dataset, providing corresponding head poses and face shapes for each video. Our framework is composed of two separate modules: PoseGAN and PGFace. Given an input audio, PoseGAN first produces a head pose sequence for the 3D head, and then, PGFace utilizes the audio and pose information to generate natural face models. With the combination of these two parts, a 3D talking head with dynamic head movement can be constructed. Experimental evidence indicates that our method can generate person-specific head pose sequences that are in sync with the input audio and that best match with the human experience of talking heads.
最近,我们见证了3D会说话脸部生成应用的蓬勃发展。然而,大多数现有的3D脸部生成方法只能生成头部姿势静态的3D脸部,这与人类感知脸部的方式不一致。只有少数文章关注头部姿势生成,但即便如此,这些文章也忽略了个性属性。在本文中,我们提出了一种统一的音频驱动方法,赋予3D会说话脸部个性化的姿势动态。为实现这一目标,我们建立了一个原始的特定人物数据集,为每个视频提供相应的头部姿势和脸部形状。我们的框架由两个独立的模块组成:PoseGAN和PGFace。给定输入音频,PoseGAN首先为3D头部生成一个头部姿势序列,然后,PGFace利用音频和姿势信息生成自然的脸部模型。通过这两部分的结合,可以构建一个具有动态头部运动的3D会说话头部。实验证据表明,我们的方法可以生成与输入音频同步且最符合人类对会说话头部体验的特定人物头部姿势序列。