Tan Hongchen, Liu Xiuping, Yin Baocai, Li Xin
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10309-10323. doi: 10.1109/TNNLS.2022.3165573. Epub 2023 Nov 30.
This article presents a new text-to-image (T2I) generation model, named distribution regularization generative adversarial network (DR-GAN), to generate images from text descriptions from improved distribution learning. In DR-GAN, we introduce two novel modules: a semantic disentangling module (SDM) and a distribution normalization module (DNM). SDM combines the spatial self-attention mechanism (SSAM) and a new semantic disentangling loss (SDL) to help the generator distill key semantic information for the image generation. DNM uses a variational auto-encoder (VAE) to normalize and denoise the image latent distribution, which can help the discriminator better distinguish synthesized images from real images. DNM also adopts a distribution adversarial loss (DAL) to guide the generator to align with normalized real image distributions in the latent space. Extensive experiments on two public datasets demonstrated that our DR-GAN achieved a competitive performance in the T2I task. The code link: https://github.com/Tan-H-C/DR-GAN-Distribution-Regularization-for-Text-to-Image-Generation.
本文提出了一种名为分布正则化生成对抗网络(DR-GAN)的新的文本到图像(T2I)生成模型,用于通过改进的分布学习从文本描述中生成图像。在DR-GAN中,我们引入了两个新颖的模块:语义解缠模块(SDM)和分布归一化模块(DNM)。SDM结合了空间自注意力机制(SSAM)和一种新的语义解缠损失(SDL),以帮助生成器提取用于图像生成的关键语义信息。DNM使用变分自编码器(VAE)对图像潜在分布进行归一化和去噪,这有助于判别器更好地区分合成图像和真实图像。DNM还采用了分布对抗损失(DAL)来引导生成器在潜在空间中与归一化的真实图像分布对齐。在两个公共数据集上进行的大量实验表明,我们的DR-GAN在T2I任务中取得了有竞争力的性能。代码链接:https://github.com/Tan-H-C/DR-GAN-Distribution-Regularization-for-Text-to-Image-Generation 。