Pei Yuru, Zha Hongbin
National Laboratory on Machine Perception, Peking University, Haidian District, Beijing, China.
IEEE Trans Vis Comput Graph. 2007 Jan-Feb;13(1):58-69. doi: 10.1109/TVCG.2007.22.
We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the low-dimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models.
我们提出了一种将低质量视频中记录的语音动画转移到高分辨率3D面部模型的新方法。基本思想是基于一小组跨越3D面部空间的3D关键面部形状,通过插值来合成动画面部。通过在2D视频空间中进行无监督学习过程来提取3D关键形状,以形成一组2D视位,然后将其映射到3D面部空间。学习过程包括两个主要阶段:1)基于等距映射的非线性降维,将视频语音运动嵌入到低维流形中;2)在低维空间中进行K均值聚类,以提取2D关键视位帧。我们的主要贡献在于,我们使用基于等距映射的学习方法来提取语音视频空间的内在几何结构,从而使得定义3D关键视位形状成为可能。为此,我们只需使用普通3D扫描仪捕获有限数量的3D关键面部模型。此外,我们还基于简单的解剖结构开发了一种颅骨运动恢复方法,以增强局部口部运动的3D真实感。实验结果表明,我们的方法使用少量3D关键面部模型就能实现逼真的3D动画效果。