IEEE Trans Cybern. 2022 Jun;52(6):4825-4836. doi: 10.1109/TCYB.2021.3071172. Epub 2022 Jun 16.
Modifying facial attributes without the paired dataset proves to be a challenging task. Previously, approaches either required supervision from a ground-truth transformed image or required training a separate model for mapping every pair of attributes. These limit the scalability of the models to accommodate a larger set of attributes since the number of models that we need to train grows exponentially large. Another major drawback of the previous approaches is the unintentional gain of the identity of the person as they transform the facial attributes. We propose a method that allows for controllable and identity-aware transformations across multiple facial attributes using only a single model. Our approach is to train a generative adversarial network (GAN) with a multitask conditional discriminator that recognizes the identity of the face, distinguishes real images from fake, as well as identifies facial attributes present in an image. This guides the generator into producing an output that is realistic while preserving the person's identity and facial attributes. Through this framework, our model also learns meaningful image representations in a lower dimensional latent space and semantically associate separate parts of the encoded vector with both the person's identity and facial attributes. This opens up the possibility of generating new faces and other transformations such as making the face thinner or chubbier. Furthermore, our model only encodes the image once and allows for multiple transformations using the encoded vector. This allows for faster transformations since it does not need to reprocess the entire image for every transformation. We show the effectiveness of our proposed method through both qualitative and quantitative evaluations, such as ablative studies, visual inspection, and face verification. Competitive results are achieved compared to the main competition (CycleGAN), however, at great space and extensibility gain by using a single model.
在没有配对数据集的情况下修改面部属性被证明是一项具有挑战性的任务。以前的方法要么需要来自真实变换图像的监督,要么需要为映射每对属性训练单独的模型。这些限制了模型的可扩展性,以适应更大的属性集,因为我们需要训练的模型数量呈指数级增长。以前的方法的另一个主要缺点是,在转换面部属性时,人们的身份会无意中获得。我们提出了一种仅使用单个模型即可实现跨多个面部属性的可控和身份感知转换的方法。我们的方法是使用具有多任务条件鉴别器的生成对抗网络 (GAN) 进行训练,该鉴别器可以识别面部的身份,区分真实图像和假图像,以及识别图像中存在的面部属性。这引导生成器产生逼真的输出,同时保留人的身份和面部属性。通过这个框架,我们的模型还可以在低维潜在空间中学习有意义的图像表示,并将编码向量的各个部分与人物的身份和面部属性语义相关联。这为生成新的面孔和其他转换(例如使面部更瘦或更圆润)打开了可能性。此外,我们的模型只对图像进行一次编码,并允许使用编码向量进行多次转换。由于不需要为每次转换重新处理整个图像,因此可以实现更快的转换。我们通过定性和定量评估(例如消融研究、视觉检查和面部验证)展示了我们提出的方法的有效性。与主要竞争方法(CycleGAN)相比,我们的方法取得了有竞争力的结果,但是通过使用单个模型,可以获得更大的空间和可扩展性收益。