Suppr超能文献

通过解缠眨眼特征生成可控制眼部运动的对话人脸。

Generating Talking Face With Controllable Eye Movements by Disentangled Blinking Feature.

出版信息

IEEE Trans Vis Comput Graph. 2023 Dec;29(12):5050-5061. doi: 10.1109/TVCG.2022.3199412. Epub 2023 Nov 10.

Abstract

In virtual reality, talking face generation is committed to using voice and face images to generate real face speech videos to improve the communication experience in the case of limited user information exchange. In a real video, blinking is an action often accompanied by speech, and it is also one of the indispensable actions in real face speech videos. However, the current methods either do not pay attention to the generation of eye movements, or cannot control the blinking in the generated results. To this end, this article proposes a novel system which produces vivid talking face with controllable eye blinks driven by the joint features including identity feature, audio feature, and blink feature. In order to disentangle the blinking action, we designed three independent features to individually drive the main components in the generated frame, namely the facial appearance, mouth movements, and eye movements. Through the adversarial training of the identity encoder, we filter out the information of the eye state from the identity feature, thereby strengthening the independence of the blinking feature. We introduced the blink score as the leading information of the blink feature, and through training, the value can be consistent with human perception to form a complete and independent control of the eyes. Experimental results on multiple datasets show that our method can not only reproduce real talking faces, but also ensure that the blinking pattern and time are fully controllable.

摘要

在虚拟现实中,说话人脸生成致力于使用语音和人脸图像生成真实人脸语音视频,以在有限的用户信息交换情况下改善通信体验。在真实视频中,眨眼是伴随语音经常发生的动作,也是真实人脸语音视频中不可缺少的动作之一。然而,目前的方法要么不关注眼球运动的生成,要么无法控制生成结果中的眨眼。为此,本文提出了一种新的系统,该系统使用包括身份特征、音频特征和眨眼特征在内的联合特征来驱动可控的眼睛眨眼,从而生成生动的说话人脸。为了分离眨眼动作,我们设计了三个独立的特征来分别驱动生成帧中的主要组件,即面部外观、口型运动和眼部运动。通过对身份编码器的对抗训练,我们从身份特征中过滤出眼睛状态的信息,从而增强眨眼特征的独立性。我们引入眨眼分数作为眨眼特征的主导信息,通过训练,该值可以与人的感知一致,从而形成对眼睛的完整和独立控制。在多个数据集上的实验结果表明,我们的方法不仅可以再现真实的说话人脸,还可以确保眨眼模式和时间完全可控。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验