Suppr超能文献

记忆是对话式面部生成中的一对多映射缓解因素。

Memories are One-to-Many Mapping Alleviators in Talking Face Generation.

作者信息

Tang Anni, He Tianyu, Tan Xu, Ling Jun, Li Runnan, Zhao Sheng, Bian Jiang, Song Li

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8758-8770. doi: 10.1109/TPAMI.2024.3409380. Epub 2024 Nov 6.

Abstract

Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. According to the nature of audio to lip motions mapping, the same speech content may have different appearances even for the same person at different occasions. Such one-to-many mapping problem brings ambiguity during training and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.

摘要

会说话的脸部生成旨在生成由输入音频驱动的目标人物的逼真视频肖像。根据音频到唇部动作映射的性质,即使对于同一个人在不同场合,相同的语音内容也可能有不同的外观。这种一对多的映射问题在训练过程中会带来模糊性,从而导致视觉效果不佳。尽管这种一对多的映射可以通过两阶段框架(即音频到表情模型,后跟神经渲染模型)部分缓解,但由于预测是在没有足够信息(例如情绪、皱纹等)的情况下产生的,仍然不够充分。在本文中,我们提出了MemFace,分别通过遵循两个阶段意义的隐式记忆和显式记忆来补充缺失的信息。更具体地说,隐式记忆用于音频到表情模型,以捕捉音频-表情共享空间中的高级语义,而显式记忆用于神经渲染模型,以帮助合成像素级细节。我们的实验结果表明,我们提出的MemFace在多个场景中始终显著超越所有现有技术的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验