IEEE Comput Graph Appl. 2021 Jul-Aug;41(4):52-63. doi: 10.1109/MCG.2021.3068035. Epub 2021 Jul 15.
This article presents a hybrid animation approach that combines example-based and neural animation methods to create a simple, yet powerful animation regime for human faces. Example-based methods usually employ a database of prerecorded sequences that are concatenated or looped in order to synthesize novel animations. In contrast to this traditional example-based approach, we introduce a light-weight auto-regressive network to transform our animation-database into a parametric model. During training, our network learns the dynamics of facial expressions, which enables the replay of annotated sequences from our animation database as well as their seamless concatenation in new order. This representation is especially useful for the synthesis of visual speech, where coarticulation creates interdependencies between adjacent visemes, which affects their appearance. Instead of creating an exhaustive database that contains all viseme variants, we use our animation-network to predict the correct appearance. This allows realistic synthesis of novel facial animation sequences like visual-speech but also general facial expressions in an example-based manner.
本文提出了一种混合动画方法,结合基于实例和神经动画方法,为人脸创建一个简单而强大的动画系统。基于实例的方法通常使用预先录制的序列数据库进行拼接或循环,以合成新的动画。与这种传统的基于实例的方法不同,我们引入了一个轻量级的自回归网络,将我们的动画数据库转换为参数模型。在训练过程中,我们的网络学习面部表情的动态,这使得可以从我们的动画数据库中重放注释的序列,并以新的顺序无缝拼接它们。这种表示对于视觉语音的合成特别有用,在视觉语音中,协同发音会在相邻的元音之间产生相互依赖关系,从而影响它们的外观。我们不是创建一个包含所有元音变体的详尽数据库,而是使用我们的动画网络来预测正确的外观。这允许以基于实例的方式真实地合成新的面部动画序列,如视觉语音,也可以合成一般的面部表情。