基于局部到全局潜在扩散的表情3D面部动画生成

Expressive 3D Facial Animation Generation Based on Local-to-Global Latent Diffusion.

作者信息

Song Wenfeng, Wang Xuan, Jiang Yiming, Li Shuai, Hao Aimin, Hou Xia, Qin Hong

出版信息

IEEE Trans Vis Comput Graph. 2024 Nov;30(11):7397-7407. doi: 10.1109/TVCG.2024.3456213. Epub 2024 Oct 10.

DOI:10.1109/TVCG.2024.3456213

Abstract

3D Facial animations, crucial to augmented and mixed reality digital media, have evolved from mere aesthetic elements to potent storytelling media. Despite considerable progress in facial animation of neutral emotions, existing methods still struggle to capture the authenticity of emotions. This paper introduces a novel approach to capture fine facial expressions and generate facial animations using audio synchronization. Our method consists of two key components: First, the Local-to-global Latent Diffusion Model (LG-LDM) tailored for authentic facial expressions, which can integrate audio, time step, facial expressions, and other conditions towards possible encoding of emotionally rich yet latent features in response to possibly noisy raw audio signals. The core of LG-LDM is our carefully designed Facial Denoiser Model (FDM) for aligning the local-to-global animation feature with audio. Second, we redesign an Emotion-centric Vector Quantized-Variational AutoEncoder framework (EVQ-VAE) to finely decode the subtle differences under different emotions and reconstruct the final 3D facial geometry. Our work significantly contributes to the key challenges of emotionally realistic 3D facial animation for audio synchronization and enhances the immersive experience and emotional depth in augmented and mixed reality applications. We provide a reproducibility kit including our code, dataset, and detailed instructions for running the experiments. This kit is available at https://github.com/wangxuanx/Face-Diffusion-Model.

摘要

3D面部动画对增强现实和混合现实数字媒体至关重要，已从单纯的美学元素演变为强大的叙事媒介。尽管中性情绪的面部动画取得了显著进展，但现有方法仍难以捕捉情绪的真实性。本文介绍了一种新颖的方法，通过音频同步来捕捉精细的面部表情并生成面部动画。我们的方法由两个关键组件组成：第一，为真实面部表情量身定制的局部到全局潜在扩散模型（LG-LDM），它可以将音频、时间步长、面部表情和其他条件整合起来，以响应可能有噪声的原始音频信号，对情感丰富但潜在的特征进行可能的编码。LG-LDM的核心是我们精心设计的面部去噪器模型（FDM），用于将局部到全局的动画特征与音频对齐。第二，我们重新设计了一个以情感为中心的矢量量化变分自动编码器框架（EVQ-VAE），以精细解码不同情绪下的细微差异并重建最终的3D面部几何形状。我们的工作对音频同步的情感逼真3D面部动画的关键挑战做出了重大贡献，并增强了增强现实和混合现实应用中的沉浸感和情感深度。我们提供了一个可重现工具包，包括我们的代码、数据集以及运行实验的详细说明。该工具包可在https://github.com/wangxuanx/Face-Diffusion-Model获取。

相似文献

Expressive 3D Facial Animation Generation Based on Local-to-Global Latent Diffusion.基于局部到全局潜在扩散的表情3D面部动画生成

IEEE Trans Vis Comput Graph. 2024 Nov;30(11):7397-7407. doi: 10.1109/TVCG.2024.3456213. Epub 2024 Oct 10.

TalkingStyle: Personalized Speech-Driven 3D Facial Animation with Style Preservation.谈话风格：具有风格保留的个性化语音驱动3D面部动画

IEEE Trans Vis Comput Graph. 2024 Jun 11;PP. doi: 10.1109/TVCG.2024.3409568.

Pose-Aware 3D Talking Face Synthesis Using Geometry-Guided Audio-Vertices Attention.基于几何引导的音频顶点注意力的姿态感知3D会说话人脸合成

IEEE Trans Vis Comput Graph. 2025 Mar;31(3):1758-1771. doi: 10.1109/TVCG.2024.3371064. Epub 2025 Jan 30.

Performance-driven facial animation: basic research on human judgments of emotional state in facial avatars.基于表现的面部动画：关于人类对面部虚拟形象情绪状态判断的基础研究。

Cyberpsychol Behav. 2001 Aug;4(4):471-87. doi: 10.1089/109493101750527033.

Expressive facial animation synthesis by learning speech coarticulation and expression spaces.通过学习语音协同发音和表情空间实现表情丰富的面部动画合成。

IEEE Trans Vis Comput Graph. 2006 Nov-Dec;12(6):1523-34. doi: 10.1109/TVCG.2006.90.

A multimodal dynamical variational autoencoder for audiovisual speech representation learning.一种用于视听语音表示学习的多模态动态变分自编码器。

Neural Netw. 2024 Apr;172:106120. doi: 10.1016/j.neunet.2024.106120. Epub 2024 Jan 11.

Learn2Talk: 3D Talking Face Learns from 2D Talking Face.Learn2Talk：从二维会说话的面部学习三维会说话的面部。

IEEE Trans Vis Comput Graph. 2024 Oct 7;PP. doi: 10.1109/TVCG.2024.3476275.

Augmented reality with algorithm animation and their effect on students' emotions.带有算法动画的增强现实及其对学生情绪的影响。

Multimed Tools Appl. 2023;82(8):11819-11845. doi: 10.1007/s11042-022-13679-1. Epub 2022 Sep 7.

Facial Prior Guided Micro-Expression Generation.基于面部先验的微表情生成。

IEEE Trans Image Process. 2024;33:525-540. doi: 10.1109/TIP.2023.3345177. Epub 2024 Jan 4.

Data-Driven 3D Neck Modeling and Animation.数据驱动的 3D 颈部建模与动画。

IEEE Trans Vis Comput Graph. 2021 Jul;27(7):3226-3237. doi: 10.1109/TVCG.2020.2967036. Epub 2021 May 27.

引用本文的文献

Effectiveness and optimization of bidirectional long short-term memory (BiLSTM) based fast detection of deep fake face videos for real-time applications.基于双向长短期记忆（BiLSTM）的深度伪造人脸视频快速检测在实时应用中的有效性与优化

PeerJ Comput Sci. 2025 May 23;11:e2867. doi: 10.7717/peerj-cs.2867. eCollection 2025.

Digital media recommendation system design based on user behavior analysis and emotional feature extraction.基于用户行为分析和情感特征提取的数字媒体推荐系统设计

PLoS One. 2025 May 19;20(5):e0322768. doi: 10.1371/journal.pone.0322768. eCollection 2025.

Geometrically aware transformer for point cloud analysis.用于点云分析的几何感知变压器。

Sci Rep. 2025 May 13;15(1):16545. doi: 10.1038/s41598-025-00789-7.

Enhancing neurological disease diagnostics: fusion of deep transfer learning with optimization algorithm for acute brain stroke prediction using facial images.增强神经疾病诊断：深度迁移学习与优化算法融合用于基于面部图像的急性脑中风预测

Sci Rep. 2025 Apr 10;15(1):12334. doi: 10.1038/s41598-025-97034-y.

Fractal and chaotic map-enhanced grey wolf optimization for robust fire detection in deep convolutional neural networks.用于深度卷积神经网络中稳健火灾检测的分形与混沌映射增强灰狼优化算法

Sci Rep. 2025 Apr 3;15(1):11495. doi: 10.1038/s41598-025-95519-4.

Multi-grained pooling network for age estimation in degraded low-resolution images.用于退化低分辨率图像年龄估计的多粒度池化网络。

Sci Rep. 2025 Mar 7;15(1):8030. doi: 10.1038/s41598-025-91845-9.

Federated Learning Framework for Real-Time Activity and Context Monitoring Using Edge Devices.使用边缘设备进行实时活动和上下文监测的联邦学习框架。

Sensors (Basel). 2025 Feb 19;25(4):1266. doi: 10.3390/s25041266.

FacialNet: facial emotion recognition for mental health analysis using UNet segmentation with transfer learning model.面部网络：使用带有迁移学习模型的U-Net分割进行心理健康分析的面部表情识别

Front Comput Neurosci. 2024 Dec 11;18:1485121. doi: 10.3389/fncom.2024.1485121. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于局部到全局潜在扩散的表情3D面部动画生成

Expressive 3D Facial Animation Generation Based on Local-to-Global Latent Diffusion.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献