将语音动作从视频转移到3D面部空间。

Transferring of speech movements from video to 3D face space.

作者信息

Pei Yuru, Zha Hongbin

机构信息

National Laboratory on Machine Perception, Peking University, Haidian District, Beijing, China.

出版信息

IEEE Trans Vis Comput Graph. 2007 Jan-Feb;13(1):58-69. doi: 10.1109/TVCG.2007.22.

DOI:10.1109/TVCG.2007.22

PMID:17093336

Abstract

We present a novel method for transferring speech animation recorded in low quality videos to high resolution 3D face models. The basic idea is to synthesize the animated faces by an interpolation based on a small set of 3D key face shapes which span a 3D face space. The 3D key shapes are extracted by an unsupervised learning process in 2D video space to form a set of 2D visemes which are then mapped to the 3D face space. The learning process consists of two main phases: 1) Isomap-based nonlinear dimensionality reduction to embed the video speech movements into a low-dimensional manifold and 2) K-means clustering in the low-dimensional space to extract 2D key viseme frames. Our main contribution is that we use the Isomap-based learning method to extract intrinsic geometry of the speech video space and thus to make it possible to define the 3D key viseme shapes. To do so, we need only to capture a limited number of 3D key face models by using a general 3D scanner. Moreover, we also develop a skull movement recovery method based on simple anatomical structures to enhance 3D realism in local mouth movements. Experimental results show that our method can achieve realistic 3D animation effects with a small number of 3D key face models.

摘要

我们提出了一种将低质量视频中记录的语音动画转移到高分辨率3D面部模型的新方法。基本思想是基于一小组跨越3D面部空间的3D关键面部形状，通过插值来合成动画面部。通过在2D视频空间中进行无监督学习过程来提取3D关键形状，以形成一组2D视位，然后将其映射到3D面部空间。学习过程包括两个主要阶段：1）基于等距映射的非线性降维，将视频语音运动嵌入到低维流形中；2）在低维空间中进行K均值聚类，以提取2D关键视位帧。我们的主要贡献在于，我们使用基于等距映射的学习方法来提取语音视频空间的内在几何结构，从而使得定义3D关键视位形状成为可能。为此，我们只需使用普通3D扫描仪捕获有限数量的3D关键面部模型。此外，我们还基于简单的解剖结构开发了一种颅骨运动恢复方法，以增强局部口部运动的3D真实感。实验结果表明，我们的方法使用少量3D关键面部模型就能实现逼真的3D动画效果。

相似文献

Transferring of speech movements from video to 3D face space.将语音动作从视频转移到3D面部空间。

IEEE Trans Vis Comput Graph. 2007 Jan-Feb;13(1):58-69. doi: 10.1109/TVCG.2007.22.

Accurate visible speech synthesis based on concatenating variable length motion capture data.基于拼接可变长度动作捕捉数据的精确可视语音合成。

IEEE Trans Vis Comput Graph. 2006 Mar-Apr;12(2):266-76. doi: 10.1109/TVCG.2006.18.

Creating speech-synchronized animation.创建语音同步动画。

IEEE Trans Vis Comput Graph. 2005 May-Jun;11(3):341-52. doi: 10.1109/TVCG.2005.43.

A video database of moving faces and people.一个包含移动面部和人物的视频数据库。

IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):812-6. doi: 10.1109/TPAMI.2005.90.

Stroke surfaces: temporally coherent artistic animations from video.中风表面：来自视频的时间连贯艺术动画。

IEEE Trans Vis Comput Graph. 2005 Sep-Oct;11(5):540-9. doi: 10.1109/TVCG.2005.85.

Expressive facial animation synthesis by learning speech coarticulation and expression spaces.通过学习语音协同发音和表情空间实现表情丰富的面部动画合成。

IEEE Trans Vis Comput Graph. 2006 Nov-Dec;12(6):1523-34. doi: 10.1109/TVCG.2006.90.

High-quality animation of 2D steady vector fields.二维稳态向量场的高质量动画。

IEEE Trans Vis Comput Graph. 2004 Jan-Feb;10(1):2-14. doi: 10.1109/TVCG.2004.1260754.

Intraclass retrieval of nonrigid 3D objects: application to face recognition.非刚性3D物体的类内检索：在人脸识别中的应用。

IEEE Trans Pattern Anal Mach Intell. 2007 Feb;29(2):218-29. doi: 10.1109/TPAMI.2007.37.

Riemannian manifold learning.黎曼流形学习

IEEE Trans Pattern Anal Mach Intell. 2008 May;30(5):796-809. doi: 10.1109/TPAMI.2007.70735.

Model-based tracking by classification in a tiny discrete pose space.在微小离散姿态空间中基于分类的模型跟踪。

IEEE Trans Pattern Anal Mach Intell. 2007 Jun;29(6):976-89. doi: 10.1109/TPAMI.2007.1088.

将语音动作从视频转移到3D面部空间。

Transferring of speech movements from video to 3D face space.

作者信息

Pei Yuru, Zha Hongbin

机构信息

National Laboratory on Machine Perception, Peking University, Haidian District, Beijing, China.

出版信息

IEEE Trans Vis Comput Graph. 2007 Jan-Feb;13(1):58-69. doi: 10.1109/TVCG.2007.22.

DOI:10.1109/TVCG.2007.22

PMID:17093336

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

将语音动作从视频转移到3D面部空间。

Transferring of speech movements from video to 3D face space.

作者信息

机构信息

出版信息

相似文献

将语音动作从视频转移到3D面部空间。

Transferring of speech movements from video to 3D face space.

作者信息

机构信息

出版信息

相似文献