Yu Shikang, Han Hu, Shan Shiguang, Chen Xilin
IEEE Trans Image Process. 2023;32:144-158. doi: 10.1109/TIP.2022.3226413. Epub 2022 Dec 19.
Cross-modality face image synthesis such as sketch-to-photo, NIR-to-RGB, and RGB-to-depth has wide applications in face recognition, face animation, and digital entertainment. Conventional cross-modality synthesis methods usually require paired training data, i.e., each subject has images of both modalities. However, paired data can be difficult to acquire, while unpaired data commonly exist. In this paper, we propose a novel semi-supervised cross-modality synthesis method (namely CMOS-GAN), which can leverage both paired and unpaired face images to learn a robust cross-modality synthesis model. Specifically, CMOS-GAN uses a generator of encoder-decoder architecture for new modality synthesis. We leverage pixel-wise loss, adversarial loss, classification loss, and face feature loss to exploit the information from both paired multi-modality face images and unpaired face images for model learning. In addition, since we expect the synthetic new modality can also be helpful for improving face recognition accuracy, we further use a modified triplet loss to retain the discriminative features of the subject in the synthetic modality. Experiments on three cross-modality face synthesis tasks (NIR-to-VIS, RGB-to-depth, and sketch-to-photo) show the effectiveness of the proposed approach compared with the state-of-the-art. In addition, we also collect a large-scale RGB-D dataset (VIPL-MumoFace-3K) for the RGB-to-depth synthesis task. We plan to open-source our code and VIPL-MumoFace-3K dataset to the community (https://github.com/skgyu/CMOS-GAN).
跨模态人脸图像合成,如草图到照片、近红外到RGB以及RGB到深度,在人脸识别、人脸动画和数字娱乐等领域有着广泛应用。传统的跨模态合成方法通常需要配对的训练数据,即每个受试者都有两种模态的图像。然而,配对数据可能难以获取,而未配对的数据却普遍存在。在本文中,我们提出了一种新颖的半监督跨模态合成方法(即CMOS-GAN),它可以利用配对和未配对的人脸图像来学习一个强大的跨模态合成模型。具体而言,CMOS-GAN使用编码器-解码器架构的生成器进行新模态合成。我们利用像素级损失、对抗损失、分类损失和人脸特征损失,从配对的多模态人脸图像和未配对的人脸图像中提取信息用于模型学习。此外,由于我们期望合成的新模态也有助于提高人脸识别准确率,我们进一步使用改进的三元组损失来保留合成模态中受试者的判别特征。在三个跨模态人脸合成任务(近红外到可见光、RGB到深度以及草图到照片)上的实验表明,与现有技术相比,所提出的方法是有效的。此外,我们还为RGB到深度合成任务收集了一个大规模的RGB-D数据集(VIPL-MumoFace-3K)。我们计划将我们的代码和VIPL-MumoFace-3K数据集开源给社区(https://github.com/skgyu/CMOS-GAN)。