使用神经网络实现带表情的实时语音驱动面部动画。

Real-time speech-driven face animation with expressions using neural networks.

作者信息

Hong Pengyu, Wen Zhen, Huang T S

机构信息

Beckman Inst. for Adv. Sci. and Technol., Illinois Univ., Urbana, IL, USA.

出版信息

IEEE Trans Neural Netw. 2002;13(4):916-27. doi: 10.1109/TNN.2002.1021892.

DOI:10.1109/TNN.2002.1021892

PMID:18244487

Abstract

A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). A facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.

摘要

实时语音驱动的合成说话人脸在分布式协作环境中提供了一种有效的多模态通信接口。诸如面部表情之类的非语言手势对人类交流很重要，语音驱动的面部动画系统应予以考虑。在本文中，我们提出了一个框架，该框架使用神经网络系统地解决面部变形建模、自动面部运动分析以及具有表情的实时语音驱动面部动画问题。基于此框架，我们学习面部变形的定量视觉表示，称为运动单元（MU）。面部变形可以通过由MU参数（MUP）加权的MU的线性组合来近似。我们开发了一种基于MU的面部运动跟踪算法，用于收集视听训练数据库。然后，通过使用收集到的视听训练数据库训练一组神经网络来构建实时音频到MUP的映射。映射的定量评估表明了所提方法的有效性。使用所提方法，我们为iFACE系统开发了具有表情的实时语音驱动面部动画功能。实验结果表明，iFACE系统的合成表情说话人脸在对双峰人类情感感知的影响效果方面与真实人脸相当。