FACEMUG：一种用于局部面部编辑的多模态生成与融合框架。

FACEMUG: A Multimodal Generative and Fusion Framework for Local Facial Editing.

作者信息

Lu Wanglong, Wang Jikai, Jin Xiaogang, Jiang Xianta, Zhao Hanli

出版信息

IEEE Trans Vis Comput Graph. 2024 Jul 26;PP. doi: 10.1109/TVCG.2024.3434386.

DOI:10.1109/TVCG.2024.3434386

Abstract

Existing facial editing methods have achieved remarkable results, yet they often fall short in supporting multimodal conditional local facial editing. One of the significant evidences is that their output image quality degrades dramatically after several iterations of incremental editing, as they do not support local editing. In this paper, we present a novel multimodal generative and fusion framework for globally-consistent local facial editing (FACEMUG) that can handle a wide range of input modalities and enable fine-grained and semantic manipulation while remaining unedited parts unchanged. Different modalities, including sketches, semantic maps, color maps, exemplar images, text, and attribute labels, are adept at conveying diverse conditioning details, and their combined synergy can provide more explicit guidance for the editing process. We thus integrate all modalities into a unified generative latent space to enable multimodal local facial edits. Specifically, a novel multimodal feature fusion mechanism is proposed by utilizing multimodal aggregation and style fusion blocks to fuse facial priors and multimodalities in both latent and feature spaces. We further introduce a novel self-supervised latent warping algorithm to rectify misaligned facial features, efficiently transferring the pose of the edited image to the given latent codes. We evaluate our FACEMUG through extensive experiments and comparisons to state-of-the-art (SOTA) methods. The results demonstrate the superiority of FACEMUG in terms of editing quality, flexibility, and semantic control, making it a promising solution for a wide range of local facial editing tasks.

摘要

现有的面部编辑方法已经取得了显著成果，但在支持多模态条件局部面部编辑方面往往存在不足。一个重要的证据是，由于它们不支持局部编辑，在经过几次增量编辑迭代后，其输出图像质量会大幅下降。在本文中，我们提出了一种用于全局一致局部面部编辑的新型多模态生成与融合框架（FACEMUG），它可以处理广泛的输入模态，并在保持未编辑部分不变的同时实现细粒度和语义操作。不同的模态，包括草图、语义地图、颜色地图、示例图像、文本和属性标签，擅长传达各种条件细节，它们的协同作用可以为编辑过程提供更明确的指导。因此，我们将所有模态集成到一个统一的生成潜空间中，以实现多模态局部面部编辑。具体来说，通过利用多模态聚合和风格融合模块，在潜空间和特征空间中融合面部先验和多模态，提出了一种新型的多模态特征融合机制。我们还引入了一种新型的自监督潜空间变形算法来纠正未对齐的面部特征，有效地将编辑图像的姿态转移到给定的潜代码上。我们通过广泛的实验和与现有最先进（SOTA）方法的比较来评估我们的FACEMUG。结果证明了FACEMUG在编辑质量、灵活性和语义控制方面的优越性，使其成为广泛的局部面部编辑任务的一个有前途的解决方案。