通过优化残差生成对抗网络探索语义室内场景合成中的语义标签分解和数据集大小

Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks.

作者信息

Ibrahem Hatem, Salem Ahmed, Kang Hyun-Soo

机构信息

Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, Korea.

Electrical Engineering Department, Faculty of Engineering, Assiut University, Assiut 71515, Egypt.

出版信息

Sensors (Basel). 2022 Oct 29;22(21):8306. doi: 10.3390/s22218306.

DOI:10.3390/s22218306

PMID:36366007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9657195/

Abstract

In this paper, we revisit the paired image-to-image translation using the conditional generative adversarial network, the so-called "Pix2Pix", and propose efficient optimization techniques for the architecture and the training method to maximize the architecture's performance to boost the realism of the generated images. We propose a generative adversarial network-based technique to create new artificial indoor scenes using a user-defined semantic segmentation map as an input to define the location, shape, and category of each object in the scene, exactly similar to Pix2Pix. We train different residual connections-based architectures of the generator and discriminator on the NYU depth-v2 dataset and a selected indoor subset from the ADE20K dataset, showing that the proposed models have fewer parameters, less computational complexity, and can generate better quality images than the state of the art methods following the same technique to generate realistic indoor images. We also prove that using extra specific labels and more training samples increases the quality of the generated images; however, the proposed residual connections-based models can learn better from small datasets (i.e., NYU depth-v2) and can improve the realism of the generated images in training on bigger datasets (i.e., ADE20K indoor subset) in comparison to Pix2Pix. The proposed method achieves an LPIPS value of 0.505 and an FID value of 81.067, generating better quality images than that produced by Pix2Pix and other recent paired Image-to-image translation methods and outperforming them in terms of LPIPS and FID.

摘要

在本文中，我们重新审视了使用条件生成对抗网络（即所谓的“Pix2Pix”）进行配对图像到图像的翻译，并针对该架构和训练方法提出了高效的优化技术，以最大化架构性能，提升生成图像的真实感。我们提出了一种基于生成对抗网络的技术，使用用户定义的语义分割图作为输入来创建新的人工室内场景，以定义场景中每个物体的位置、形状和类别，这与Pix2Pix完全类似。我们在NYU深度v2数据集和从ADE20K数据集中选择的室内子集上训练生成器和判别器的不同基于残差连接的架构，结果表明，与采用相同技术生成逼真室内图像的现有方法相比，所提出的模型具有更少的参数、更低的计算复杂度，并且能够生成质量更高的图像。我们还证明，使用额外的特定标签和更多的训练样本可以提高生成图像的质量；然而，与Pix2Pix相比，所提出的基于残差连接的模型能够从较小的数据集（即NYU深度v2）中更好地学习，并且在更大的数据集（即ADE20K室内子集）上训练时能够提高生成图像的真实感。所提出的方法实现了0.505的LPIPS值和81.067的FID值，生成的图像质量优于Pix2Pix和其他最近的配对图像到图像翻译方法所生成的图像，并且在LPIPS和FID方面优于它们。