SAM-GAN：用于文本到图像合成的支持多阶段生成对抗网络的自注意力模型。

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

机构信息

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 20093, Shanghai, China.

出版信息

Neural Netw. 2021 Jun;138:57-67. doi: 10.1016/j.neunet.2021.01.023. Epub 2021 Feb 10.

DOI:10.1016/j.neunet.2021.01.023

PMID:33631607

Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted in the Caltech-UCSD Birds and Microsoft Common Objects in Context datasets, show that our model is superior to competitive models in text-to-image synthesis.

摘要

基于文本描述生成逼真图像是计算机视觉领域的一项具有挑战性的任务。尽管生成对抗网络在这项任务中取得了重大突破，但它们在生成与文本语义一致的高质量视觉逼真图像方面仍面临巨大挑战。通常，现有的文本到图像方法分两步完成此任务，即首先生成具有粗略轮廓和颜色的初始图像，然后从初始图像逐渐生成高分辨率的图像。然而，这些方法的一个缺点是，如果初始图像生成的质量不高，就很难生成令人满意的高分辨率图像。在本文中，我们提出了 SAM-GAN，一种支持多阶段生成对抗网络的自注意力，用于文本到图像的合成。通过自注意力机制，模型可以建立图像的多层次依赖关系，并融合句子和单词级别的视觉语义向量，从而提高生成图像的质量。此外，引入了多阶段感知损失，以增强合成图像与真实图像之间的语义相似性，从而增强文本和图像之间的视觉语义一致性。为了增加生成图像的多样性，我们将模式搜索正则化项集成到模型中。在 Caltech-UCSD Birds 和 Microsoft Common Objects in Context 数据集上进行的广泛实验和消融研究的结果表明，我们的模型在文本到图像合成方面优于竞争模型。

相似文献

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

Neural Netw. 2021 Jun;138:57-67. doi: 10.1016/j.neunet.2021.01.023. Epub 2021 Feb 10.

Word self-update contrastive adversarial networks for text-to-image synthesis.

Neural Netw. 2023 Oct;167:433-444. doi: 10.1016/j.neunet.2023.08.038. Epub 2023 Aug 25.

Adversarial text-to-image synthesis: A review.

Neural Netw. 2021 Dec;144:187-209. doi: 10.1016/j.neunet.2021.07.019. Epub 2021 Aug 8.

DualG-GAN, a Dual-channel Generator based Generative Adversarial Network for text-to-face synthesis.

Neural Netw. 2022 Nov;155:155-167. doi: 10.1016/j.neunet.2022.08.016. Epub 2022 Aug 19.

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.

IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1947-1962. doi: 10.1109/TPAMI.2018.2856256. Epub 2018 Jul 16.

Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network.

Neural Netw. 2021 Apr;136:207-217. doi: 10.1016/j.neunet.2020.09.002. Epub 2020 Sep 12.

Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis.

IEEE Trans Image Process. 2021;30:2798-2809. doi: 10.1109/TIP.2021.3055062. Epub 2021 Feb 12.

Semantic Object Accuracy for Generative Text-to-Image Synthesis.

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1552-1565. doi: 10.1109/TPAMI.2020.3021209. Epub 2022 Feb 3.

Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization.

Sensors (Basel). 2022 Dec 26;23(1):249. doi: 10.3390/s23010249.

Generative Image Inpainting for Retinal Images using Generative Adversarial Networks.

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2835-2838. doi: 10.1109/EMBC46164.2021.9630619.

引用本文的文献

GAN-FDSR: GAN-Based Fault Detection and System Reconfiguration Method.

Sensors (Basel). 2022 Jul 15;22(14):5313. doi: 10.3390/s22145313.

Rapid DNA origami nanostructure detection and classification using the YOLOv5 deep convolutional neural network.

Sci Rep. 2022 Mar 9;12(1):3871. doi: 10.1038/s41598-022-07759-3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SAM-GAN：用于文本到图像合成的支持多阶段生成对抗网络的自注意力模型。

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献