Pu Guo, Men Yifang, Mao Yiming, Jiang Yuning, Ma Wei-Ying, Lian Zhouhui
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1514-1532. doi: 10.1109/TPAMI.2022.3161985. Epub 2023 Jan 6.
This paper proposes Attribute-Decomposed GAN (ADGAN) and its enhanced version (ADGAN++) for controllable image synthesis, which can produce realistic images with desired attributes provided in various source inputs. The core ideas of the proposed ADGAN and ADGAN++ are both to embed component attributes into the latent space as independent codes and thus achieve flexible and continuous control of attributes via mixing and interpolation operations in explicit style representations. The major difference between them is that ADGAN processes all component attributes simultaneously while ADGAN++ utilizes a serial encoding strategy. More specifically, ADGAN consists of two encoding pathways with style block connections and is capable of decomposing the original hard mapping into multiple more accessible subtasks. In the source pathway, component layouts are extracted via a semantic parser and the segmented components are fed into a shared global texture encoder to obtain decomposed latent codes. This strategy allows for the synthesis of more realistic output images and the automatic separation of un-annotated component attributes. Although the original ADGAN works in a delicate and efficient manner, intrinsically it fails to handle the semantic image synthesizing task when the number of attribute categories is huge. To address this problem, ADGAN++ employs the serial encoding of different component attributes to synthesize each part of the target real-world image, and adopts several residual blocks with segmentation guided instance normalization to assemble the synthesized component images and refine the original synthesis result. The two-stage ADGAN++ is designed to alleviate the massive computational costs required when synthesizing real-world images with numerous attributes while maintaining the disentanglement of different attributes to enable flexible control of arbitrary component attributes of the synthesized images. Experimental results demonstrate the proposed methods' superiority over the state of the art in pose transfer, face style transfer, and semantic image synthesis, as well as their effectiveness in the task of component attribute transfer. Our code and data are publicly available at https://github.com/menyifang/ADGAN.
本文提出了用于可控图像合成的属性分解生成对抗网络(ADGAN)及其增强版本(ADGAN++),它可以根据各种源输入中提供的所需属性生成逼真的图像。所提出的ADGAN和ADGAN++的核心思想都是将组件属性作为独立代码嵌入到潜在空间中,从而通过显式风格表示中的混合和插值操作实现对属性的灵活且连续的控制。它们之间的主要区别在于,ADGAN同时处理所有组件属性,而ADGAN++采用串行编码策略。更具体地说,ADGAN由两个带有风格块连接的编码路径组成,能够将原始的硬映射分解为多个更易于处理的子任务。在源路径中,通过语义解析器提取组件布局,并将分割后的组件输入到共享的全局纹理编码器中以获得分解后的潜在代码。这种策略允许合成更逼真的输出图像,并自动分离未标注的组件属性。尽管原始的ADGAN工作方式精细且高效,但本质上当属性类别数量巨大时,它无法处理语义图像合成任务。为了解决这个问题,ADGAN++采用不同组件属性的串行编码来合成目标真实世界图像的每个部分,并采用几个带有分割引导实例归一化的残差块来组装合成的组件图像并优化原始合成结果。两阶段的ADGAN++旨在减轻在合成具有大量属性的真实世界图像时所需的大量计算成本,同时保持不同属性的解缠结,以实现对合成图像的任意组件属性的灵活控制。实验结果表明,所提出的方法在姿态转移、面部风格转移和语义图像合成方面优于现有技术,并且在组件属性转移任务中也很有效。我们的代码和数据可在https://github.com/menyifang/ADGAN上公开获取。