Tang Hao, Sebe Nicu
IEEE Trans Image Process. 2021;30:7903-7913. doi: 10.1109/TIP.2021.3109531. Epub 2021 Sep 20.
In this paper, we address the task of layout-to-image translation, which aims to translate an input semantic layout to a realistic image. One open challenge widely observed in existing methods is the lack of effective semantic constraints during the image translation process, leading to models that cannot preserve the semantic information and ignore the semantic dependencies within the same object. To address this issue, we propose a novel Double Pooling GAN (DPGAN) for generating photo-realistic and semantically-consistent results from the input layout. We also propose a novel Double Pooling Module (DPM), which consists of the Square-shape Pooling Module (SPM) and the Rectangle-shape Pooling Module (RPM). Specifically, SPM aims to capture short-range semantic dependencies of the input layout with different spatial scales, while RPM aims to capture long-range semantic dependencies from both horizontal and vertical directions. We then effectively fuse both outputs of SPM and RPM to further enlarge the receptive field of our generator. Extensive experiments on five popular datasets show that the proposed DPGAN achieves better results than state-of-the-art methods. Finally, both SPM and SPM are general and can be seamlessly integrated into any GAN-based architectures to strengthen the feature representation. The code is available at https://github.com/Ha0Tang/DPGAN.
在本文中,我们探讨布局到图像的翻译任务,该任务旨在将输入的语义布局转换为逼真的图像。现有方法中广泛观察到的一个公开挑战是在图像翻译过程中缺乏有效的语义约束,导致模型无法保留语义信息并忽略同一对象内的语义依赖关系。为了解决这个问题,我们提出了一种新颖的双池生成对抗网络(DPGAN),用于从输入布局生成逼真且语义一致的结果。我们还提出了一种新颖的双池模块(DPM),它由方形池化模块(SPM)和矩形池化模块(RPM)组成。具体而言,SPM旨在通过不同的空间尺度捕捉输入布局的短程语义依赖关系,而RPM旨在从水平和垂直方向捕捉长程语义依赖关系。然后,我们有效地融合SPM和RPM的输出,以进一步扩大生成器的感受野。在五个流行数据集上进行的大量实验表明,所提出的DPGAN比现有方法取得了更好的结果。最后,SPM和RPM都是通用的,可以无缝集成到任何基于GAN的架构中以增强特征表示。代码可在https://github.com/Ha0Tang/DPGAN获取。