Suppr超能文献

基于局部到全局潜在扩散的表情3D面部动画生成

Expressive 3D Facial Animation Generation Based on Local-to-Global Latent Diffusion.

作者信息

Song Wenfeng, Wang Xuan, Jiang Yiming, Li Shuai, Hao Aimin, Hou Xia, Qin Hong

出版信息

IEEE Trans Vis Comput Graph. 2024 Nov;30(11):7397-7407. doi: 10.1109/TVCG.2024.3456213. Epub 2024 Oct 10.

Abstract

3D Facial animations, crucial to augmented and mixed reality digital media, have evolved from mere aesthetic elements to potent storytelling media. Despite considerable progress in facial animation of neutral emotions, existing methods still struggle to capture the authenticity of emotions. This paper introduces a novel approach to capture fine facial expressions and generate facial animations using audio synchronization. Our method consists of two key components: First, the Local-to-global Latent Diffusion Model (LG-LDM) tailored for authentic facial expressions, which can integrate audio, time step, facial expressions, and other conditions towards possible encoding of emotionally rich yet latent features in response to possibly noisy raw audio signals. The core of LG-LDM is our carefully designed Facial Denoiser Model (FDM) for aligning the local-to-global animation feature with audio. Second, we redesign an Emotion-centric Vector Quantized-Variational AutoEncoder framework (EVQ-VAE) to finely decode the subtle differences under different emotions and reconstruct the final 3D facial geometry. Our work significantly contributes to the key challenges of emotionally realistic 3D facial animation for audio synchronization and enhances the immersive experience and emotional depth in augmented and mixed reality applications. We provide a reproducibility kit including our code, dataset, and detailed instructions for running the experiments. This kit is available at https://github.com/wangxuanx/Face-Diffusion-Model.

摘要

3D面部动画对增强现实和混合现实数字媒体至关重要,已从单纯的美学元素演变为强大的叙事媒介。尽管中性情绪的面部动画取得了显著进展,但现有方法仍难以捕捉情绪的真实性。本文介绍了一种新颖的方法,通过音频同步来捕捉精细的面部表情并生成面部动画。我们的方法由两个关键组件组成:第一,为真实面部表情量身定制的局部到全局潜在扩散模型(LG-LDM),它可以将音频、时间步长、面部表情和其他条件整合起来,以响应可能有噪声的原始音频信号,对情感丰富但潜在的特征进行可能的编码。LG-LDM的核心是我们精心设计的面部去噪器模型(FDM),用于将局部到全局的动画特征与音频对齐。第二,我们重新设计了一个以情感为中心的矢量量化变分自动编码器框架(EVQ-VAE),以精细解码不同情绪下的细微差异并重建最终的3D面部几何形状。我们的工作对音频同步的情感逼真3D面部动画的关键挑战做出了重大贡献,并增强了增强现实和混合现实应用中的沉浸感和情感深度。我们提供了一个可重现工具包,包括我们的代码、数据集以及运行实验的详细说明。该工具包可在https://github.com/wangxuanx/Face-Diffusion-Model获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验