IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2004-2018. doi: 10.1109/TPAMI.2020.3034267. Epub 2022 Mar 4.
Although generative adversarial networks (GANs) have made significant progress in face synthesis, there lacks enough understanding of what GANs have learned in the latent representation to map a random code to a photo-realistic image. In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space. We first find that GANs learn various semantics in some linear subspaces of the latent space. After identifying these subspaces, we can realistically manipulate the corresponding facial attributes without retraining the model. We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection, resulting in more precise control of the attribute manipulation. Besides manipulating the gender, age, expression, and presence of eyeglasses, we can even alter the face pose and fix the artifacts accidentally made by GANs. Furthermore, we perform an in-depth face identity analysis and a layer-wise analysis to evaluate the editing results quantitatively. Finally, we apply our approach to real face editing by employing GAN inversion approaches and explicitly training feed-forward models based on the synthetic data established by InterFaceGAN. Extensive experimental results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable face representation.
虽然生成对抗网络 (GAN) 在人脸合成方面取得了重大进展,但对于 GAN 在潜在表示中学习到的内容,以便将随机代码映射到逼真的图像,我们还缺乏足够的理解。在这项工作中,我们提出了一个名为 InterFaceGAN 的框架,用于解释最先进的 GAN 模型学习到的解缠人脸表示,并研究潜在空间中编码的面部语义的属性。我们首先发现 GAN 会在潜在空间的某些线性子空间中学习各种语义。在识别这些子空间之后,我们可以在不重新训练模型的情况下逼真地操作相应的面部属性。然后,我们对不同语义之间的相关性进行了详细研究,并通过子空间投影对其进行更好地解缠,从而实现更精确的属性操作控制。除了操纵性别、年龄、表情和眼镜的存在之外,我们甚至可以改变人脸姿势并修复 GAN 意外生成的伪影。此外,我们进行了深入的人脸身份分析和分层分析,以定量评估编辑结果。最后,我们通过采用 GAN 反演方法和基于 InterFaceGAN 建立的合成数据明确训练前馈模型,将我们的方法应用于真实人脸编辑。广泛的实验结果表明,学习自发地合成人脸会带来解缠和可控的人脸表示。