Li Bo, Wei Xiaolin, Liu Bin, He Zhifen, Cao Junjie, Lai Yu-Kun
IEEE Trans Vis Comput Graph. 2025 Mar;31(3):1758-1771. doi: 10.1109/TVCG.2024.3371064. Epub 2025 Jan 30.
Most of the existing 3D talking face synthesis methods suffer from the lack of detailed facial expressions and realistic head poses, resulting in unsatisfactory experiences for users. In this article, we propose a novel pose-aware 3D talking face synthesis method with a novel geometry-guided audio-vertices attention. To capture more detailed expression, such as the subtle nuances of mouth shape and eye movement, we propose to build hierarchical audio features including a global attribute feature and a series of vertex-wise local latent movement features. Then, in order to fully exploit the topology of facial models, we further propose a novel geometry-guided audio-vertices attention module to predict the displacement of each vertex by using vertex connectivity relations to take full advantage of the corresponding hierarchical audio features. Finally, to accomplish pose-aware animation, we expand the existing database with an additional pose attribute, and a novel pose estimation module is proposed by paying attention to the whole head model. Numerical experiments demonstrate the effectiveness of the proposed method on realistic expression and head movements against state-of-the-art methods.
现有的大多数3D会说话人脸合成方法都存在面部表情细节不足和头部姿势不逼真的问题,给用户带来不尽如人意的体验。在本文中,我们提出了一种新颖的具有姿势感知的3D会说话人脸合成方法,该方法采用了一种新颖的几何引导音频顶点注意力机制。为了捕捉更详细的表情,如嘴型和眼球运动的细微差别,我们建议构建分层音频特征,包括全局属性特征和一系列逐顶点的局部潜在运动特征。然后,为了充分利用面部模型的拓扑结构,我们进一步提出了一种新颖的几何引导音频顶点注意力模块,通过利用顶点连接关系来预测每个顶点的位移,从而充分利用相应的分层音频特征。最后,为了实现姿势感知动画,我们用一个额外的姿势属性扩展了现有数据库,并通过关注整个头部模型提出了一种新颖的姿势估计模块。数值实验证明了该方法在逼真表情和头部运动方面相对于现有方法的有效性。