IEEE Trans Image Process. 2021;30:6107-6116. doi: 10.1109/TIP.2021.3089909. Epub 2021 Jul 7.
Recent research has witnessed advances in facial image editing tasks including face swapping and face reenactment. However, these methods are confined to dealing with one specific task at a time. In addition, for video facial editing, previous methods either simply apply transformations frame by frame or utilize multiple frames in a concatenated or iterative fashion, which leads to noticeable visual flickers. In this paper, we propose a unified temporally consistent facial video editing framework termed UniFaceGAN. Based on a 3D reconstruction model and a simple yet efficient dynamic training sample selection mechanism, our framework is designed to handle face swapping and face reenactment simultaneously. To enforce the temporal consistency, a novel 3D temporal loss constraint is introduced based on the barycentric coordinate interpolation. Besides, we propose a region-aware conditional normalization layer to replace the traditional AdaIN or SPADE to synthesize more context-harmonious results. Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
最近的研究在面部图像编辑任务方面取得了进展,包括换脸和人脸重放。然而,这些方法一次只能处理一个特定的任务。此外,对于视频面部编辑,以前的方法要么简单地逐帧应用变换,要么以串联或迭代的方式使用多个帧,这导致明显的视觉闪烁。在本文中,我们提出了一种统一的、时间一致的面部视频编辑框架,称为 UniFaceGAN。基于 3D 重建模型和一个简单而高效的动态训练样本选择机制,我们的框架旨在同时处理换脸和人脸重放。为了强制实现时间一致性,我们根据重心坐标插值引入了一种新颖的 3D 时间损失约束。此外,我们提出了一种基于区域感知的条件归一化层来替代传统的 AdaIN 或 SPADE,以生成更具有上下文协调性的结果。与最先进的面部图像编辑方法相比,我们的框架生成的视频人像更加逼真和时间平滑。