• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有个性化姿态动态的3D会说话面部。

3D Talking Face With Personalized Pose Dynamics.

作者信息

Zhang Chenxu, Ni Saifeng, Fan Zhipeng, Li Hongbo, Zeng Ming, Budagavi Madhukar, Guo Xiaohu

出版信息

IEEE Trans Vis Comput Graph. 2023 Feb;29(2):1438-1449. doi: 10.1109/TVCG.2021.3117484. Epub 2022 Dec 29.

DOI:10.1109/TVCG.2021.3117484
PMID:34606458
Abstract

Recently, we have witnessed a boom in applications for 3D talking face generation. However, most existing 3D face generation methods can only generate 3D faces with a static head pose, which is inconsistent with how humans perceive faces. Only a few articles focus on head pose generation, but even these ignore the attribute of personality. In this article, we propose a unified audio-driven approach to endow 3D talking faces with personalized pose dynamics. To achieve this goal, we establish an original person-specific dataset, providing corresponding head poses and face shapes for each video. Our framework is composed of two separate modules: PoseGAN and PGFace. Given an input audio, PoseGAN first produces a head pose sequence for the 3D head, and then, PGFace utilizes the audio and pose information to generate natural face models. With the combination of these two parts, a 3D talking head with dynamic head movement can be constructed. Experimental evidence indicates that our method can generate person-specific head pose sequences that are in sync with the input audio and that best match with the human experience of talking heads.

摘要

最近,我们见证了3D会说话脸部生成应用的蓬勃发展。然而,大多数现有的3D脸部生成方法只能生成头部姿势静态的3D脸部,这与人类感知脸部的方式不一致。只有少数文章关注头部姿势生成,但即便如此,这些文章也忽略了个性属性。在本文中,我们提出了一种统一的音频驱动方法,赋予3D会说话脸部个性化的姿势动态。为实现这一目标,我们建立了一个原始的特定人物数据集,为每个视频提供相应的头部姿势和脸部形状。我们的框架由两个独立的模块组成:PoseGAN和PGFace。给定输入音频,PoseGAN首先为3D头部生成一个头部姿势序列,然后,PGFace利用音频和姿势信息生成自然的脸部模型。通过这两部分的结合,可以构建一个具有动态头部运动的3D会说话头部。实验证据表明,我们的方法可以生成与输入音频同步且最符合人类对会说话头部体验的特定人物头部姿势序列。

相似文献

1
3D Talking Face With Personalized Pose Dynamics.具有个性化姿态动态的3D会说话面部。
IEEE Trans Vis Comput Graph. 2023 Feb;29(2):1438-1449. doi: 10.1109/TVCG.2021.3117484. Epub 2022 Dec 29.
2
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads.StyleTalk++:用于控制说话人头的说话风格的统一框架。
IEEE Trans Pattern Anal Mach Intell. 2024 Jun;46(6):4331-4347. doi: 10.1109/TPAMI.2024.3357808. Epub 2024 May 7.
3
Pose-Aware 3D Talking Face Synthesis Using Geometry-Guided Audio-Vertices Attention.基于几何引导的音频顶点注意力的姿态感知3D会说话人脸合成
IEEE Trans Vis Comput Graph. 2025 Mar;31(3):1758-1771. doi: 10.1109/TVCG.2024.3371064. Epub 2025 Jan 30.
4
Learn2Talk: 3D Talking Face Learns from 2D Talking Face.Learn2Talk:从二维会说话的面部学习三维会说话的面部。
IEEE Trans Vis Comput Graph. 2024 Oct 7;PP. doi: 10.1109/TVCG.2024.3476275.
5
Talking Face Generation With Audio-Deduced Emotional Landmarks.基于音频提取的情感地标进行人脸对话生成。
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14099-14111. doi: 10.1109/TNNLS.2023.3274676. Epub 2024 Oct 7.
6
HeadFusion: 360 Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction.HeadFusion:结合 3D 可变形模型和 3D 重建的 360 度头部姿势跟踪。
IEEE Trans Pattern Anal Mach Intell. 2018 Nov;40(11):2653-2667. doi: 10.1109/TPAMI.2018.2841403. Epub 2018 May 29.
7
Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image.基于重建 3D 人脸模型和 2D 图像关键点匹配的头部姿势估计
Sensors (Basel). 2021 Mar 6;21(5):1841. doi: 10.3390/s21051841.
8
Robust 3D Human Pose Estimation from Single Images or Video Sequences.基于单张图像或视频序列的鲁棒 3D 人体姿态估计。
IEEE Trans Pattern Anal Mach Intell. 2019 May;41(5):1227-1241. doi: 10.1109/TPAMI.2018.2828427. Epub 2018 Apr 19.
9
An Efficient 3D Human Pose Retrieval and Reconstruction from 2D Image-Based Landmarks.基于二维图像特征点的高效三维人体姿态检索与重建。
Sensors (Basel). 2021 Apr 1;21(7):2415. doi: 10.3390/s21072415.
10
LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images.LCR-Net++:自然图像中的多人 2D 和 3D 姿态检测。
IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1146-1161. doi: 10.1109/TPAMI.2019.2892985. Epub 2019 Jan 14.

引用本文的文献

1
2D facial landmark localization method for multi-view face synthesis image using a two-pathway generative adversarial network approach.基于双通路生成对抗网络方法的多视角人脸合成图像的二维面部地标定位方法
PeerJ Comput Sci. 2022 Feb 16;8:e897. doi: 10.7717/peerj-cs.897. eCollection 2022.