Qi Xingqun, Sun Muyi, Wang Zijian, Liu Jiaming, Li Qi, Zhao Fang, Zhang Shanghang, Shan Caifeng
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2182-2195. doi: 10.1109/TNNLS.2023.3341246. Epub 2025 Feb 6.
Biphasic face photo-sketch synthesis has significant practical value in wide-ranging fields such as digital entertainment and law enforcement. Previous approaches directly generate the photo-sketch in a global view, they always suffer from the low quality of sketches and complex photograph variations, leading to unnatural and low-fidelity results. In this article, we propose a novel semantic-driven generative adversarial network to address the above issues, cooperating with graph representation learning. Considering that human faces have distinct spatial structures, we first inject class-wise semantic layouts into the generator to provide style-based spatial information for synthesized face photographs and sketches. In addition, to enhance the authenticity of details in generated faces, we construct two types of representational graphs via semantic parsing maps upon input faces, dubbed the intraclass semantic graph (IASG) and the interclass structure graph (IRSG). Specifically, the IASG effectively models the intraclass semantic correlations of each facial semantic component, thus producing realistic facial details. To preserve the generated faces being more structure-coordinated, the IRSG models interclass structural relations among every facial component by graph representation learning. To further enhance the perceptual quality of synthesized images, we present a biphasic interactive cycle training strategy by fully taking advantage of the multilevel feature consistency between the photograph and sketch. Extensive experiments demonstrate that our method outperforms the state-of-the-art competitors on the CUHK Face Sketch (CUFS) and CUHK Face Sketch FERET (CUFSF) datasets.
双相人脸照片-素描合成在数字娱乐和执法等广泛领域具有重要的实用价值。以往的方法直接在全局视角下生成照片-素描,它们总是受到素描质量低和照片变化复杂的困扰,导致结果不自然且保真度低。在本文中,我们提出了一种新颖的语义驱动生成对抗网络来解决上述问题,并与图表示学习相结合。考虑到人脸具有独特的空间结构,我们首先将类别语义布局注入生成器,为合成的人脸照片和素描提供基于风格的空间信息。此外,为了增强生成人脸中细节的真实性,我们通过输入人脸的语义解析图构建两种表示图,分别称为类内语义图(IASG)和类间结构图(IRSG)。具体来说,IASG有效地建模了每个面部语义成分的类内语义相关性,从而产生逼真的面部细节。为了使生成的人脸在结构上更协调,IRSG通过图表示学习对每个面部成分之间的类间结构关系进行建模。为了进一步提高合成图像的感知质量,我们充分利用照片和素描之间的多级特征一致性,提出了一种双相交互循环训练策略。大量实验表明,我们的方法在香港中文大学人脸素描(CUFS)和香港中文大学人脸素描FERET(CUFSF)数据集上优于现有最先进的竞争对手。