State Key Laboratory for Novel Software Technology, Nanjing University, China.
Department of Computer Science, University of Rochester, USA.
Neural Netw. 2020 Dec;132:66-74. doi: 10.1016/j.neunet.2020.08.011. Epub 2020 Aug 20.
Caricature generation is an interesting yet challenging task. The primary goal is to generate a plausible caricature with reasonable exaggerations given a face image. Conventional caricature generation approaches mainly use low-level geometric transformations such as image warping to generate exaggerated images, which lack richness and diversity in terms of content and style. The recent progress in generative adversarial networks (GANs) makes it possible to learn an image-to-image transformation from data so as to generate diverse images. However, directly applying GAN-based models to this task leads to unsatisfactory results due to the large variance in the caricature distribution. Moreover, conventional models typically require pixel-wisely paired training data which largely limits their usage scenarios. In this paper, we model caricature generation as a weakly paired image-to-image translation task, and propose CariGAN to address these issues. Specifically, to enforce reasonable exaggeration and facial deformation, manually annotated caricature facial landmarks are used as an additional condition to constrain the generated image. Furthermore, an image fusion mechanism is designed to encourage our model to focus on the key facial parts so that more vivid details in these regions can be generated. Finally, a diversity loss is proposed to encourage the model to produce diverse results. Extensive experiments on a large-scale "WebCaricature" dataset show that the proposed CariGAN can generate more visually plausible caricatures with larger diversity compared with the state-of-the-art models.
漫画生成是一项有趣而又具有挑战性的任务。主要目标是给定人脸图像,生成具有合理夸张的逼真漫画。传统的漫画生成方法主要使用低级别的几何变换,如图像变形,来生成夸张的图像,但在内容和风格方面缺乏丰富性和多样性。生成对抗网络 (GANs) 的最新进展使得从数据中学习图像到图像的转换以生成多样化的图像成为可能。然而,由于漫画分布的巨大差异,直接将基于 GAN 的模型应用于该任务会导致不理想的结果。此外,传统模型通常需要像素级配对的训练数据,这在很大程度上限制了它们的使用场景。在本文中,我们将漫画生成建模为弱配对的图像到图像转换任务,并提出 CariGAN 来解决这些问题。具体来说,为了实现合理的夸张和面部变形,我们使用手动注释的漫画人脸地标作为附加条件来约束生成的图像。此外,设计了一种图像融合机制,以鼓励我们的模型专注于关键的面部部位,从而可以在这些区域生成更生动的细节。最后,提出了一种多样性损失来鼓励模型生成多样化的结果。在大规模的“WebCaricature”数据集上进行的广泛实验表明,与最先进的模型相比,所提出的 CariGAN 可以生成更具视觉逼真度且多样性更大的漫画。