Deng Zhigang, Neumann Ulrich, Lewis J P, Kim Tae-Yong, Bulut Murtaza, Narayanan Shrikanth
Department of Computer Science, University of Houston, TX 77004, USA.
IEEE Trans Vis Comput Graph. 2006 Nov-Dec;12(6):1523-34. doi: 10.1109/TVCG.2006.90.
Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D motions of the markers on the face of a human subject are captured while he/she recites a predesigned corpus, with specific spoken and visual expressions. We present a novel motion capture mining technique that "learns" speech coarticulation models for diphones and triphones from the recorded data. A Phoneme-Independent Expression Eigenspace (PIEES) that encloses the dynamic expression signals is constructed by motion signal processing (phoneme-based time-warping and subtraction) and Principal Component Analysis (PCA) reduction. New expressive facial animations are synthesized as follows: First, the learned coarticulation models are concatenated to synthesize neutral visual speech according to novel speech input, then a texture-synthesis-based approach is used to generate a novel dynamic expression signal from the PIEES model, and finally the synthesized expression signal is blended with the synthesized neutral visual speech to create the final expressive facial animation. Our experiments demonstrate that the system can effectively synthesize realistic expressive facial animation.
合成富有表现力的面部动画是图形学界一个极具挑战性的课题。在本文中,我们提出了一个通过从面部动作捕捉数据中自动学习来实现的富有表现力的面部动画合成系统。当人类受试者背诵预先设计的语料库时,同时带有特定的语音和视觉表情,面部标记的精确3D运动被捕捉下来。我们提出了一种新颖的动作捕捉挖掘技术,该技术从记录的数据中“学习”双音素和三音素的语音协同发音模型。通过运动信号处理(基于音素的时间规整和减法)和主成分分析(PCA)降维,构建了一个包含动态表情信号的音素无关表情特征空间(PIEES)。新的富有表现力的面部动画合成如下:首先,根据新的语音输入连接学习到的协同发音模型来合成中性视觉语音,然后使用基于纹理合成的方法从PIEES模型生成新的动态表情信号,最后将合成的表情信号与合成的中性视觉语音混合以创建最终的富有表现力的面部动画。我们的实验表明,该系统能够有效地合成逼真的富有表现力的面部动画。