• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAM-GAN:用于文本到图像合成的支持多阶段生成对抗网络的自注意力模型。

SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.

机构信息

Shanghai Key Lab of Modern Optical System, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, 20093, Shanghai, China.

出版信息

Neural Netw. 2021 Jun;138:57-67. doi: 10.1016/j.neunet.2021.01.023. Epub 2021 Feb 10.

DOI:10.1016/j.neunet.2021.01.023
PMID:33631607
Abstract

Synthesizing photo-realistic images based on text descriptions is a challenging task in the field of computer vision. Although generative adversarial networks have made significant breakthroughs in this task, they still face huge challenges in generating high-quality visually realistic images consistent with the semantics of text. Generally, existing text-to-image methods accomplish this task with two steps, that is, first generating an initial image with a rough outline and color, and then gradually yielding the image within high-resolution from the initial image. However, one drawback of these methods is that, if the quality of the initial image generation is not high, it is hard to generate a satisfactory high-resolution image. In this paper, we propose SAM-GAN, Self-Attention supporting Multi-stage Generative Adversarial Networks, for text-to-image synthesis. With the self-attention mechanism, the model can establish the multi-level dependence of the image and fuse the sentence- and word-level visual-semantic vectors, to improve the quality of the generated image. Furthermore, a multi-stage perceptual loss is introduced to enhance the semantic similarity between the synthesized image and the real image, thus enhancing the visual-semantic consistency between text and images. For the diversity of the generated images, a mode seeking regularization term is integrated into the model. The results of extensive experiments and ablation studies, which were conducted in the Caltech-UCSD Birds and Microsoft Common Objects in Context datasets, show that our model is superior to competitive models in text-to-image synthesis.

摘要

基于文本描述生成逼真图像是计算机视觉领域的一项具有挑战性的任务。尽管生成对抗网络在这项任务中取得了重大突破,但它们在生成与文本语义一致的高质量视觉逼真图像方面仍面临巨大挑战。通常,现有的文本到图像方法分两步完成此任务,即首先生成具有粗略轮廓和颜色的初始图像,然后从初始图像逐渐生成高分辨率的图像。然而,这些方法的一个缺点是,如果初始图像生成的质量不高,就很难生成令人满意的高分辨率图像。在本文中,我们提出了 SAM-GAN,一种支持多阶段生成对抗网络的自注意力,用于文本到图像的合成。通过自注意力机制,模型可以建立图像的多层次依赖关系,并融合句子和单词级别的视觉语义向量,从而提高生成图像的质量。此外,引入了多阶段感知损失,以增强合成图像与真实图像之间的语义相似性,从而增强文本和图像之间的视觉语义一致性。为了增加生成图像的多样性,我们将模式搜索正则化项集成到模型中。在 Caltech-UCSD Birds 和 Microsoft Common Objects in Context 数据集上进行的广泛实验和消融研究的结果表明,我们的模型在文本到图像合成方面优于竞争模型。

相似文献

1
SAM-GAN: Self-Attention supporting Multi-stage Generative Adversarial Networks for text-to-image synthesis.SAM-GAN:用于文本到图像合成的支持多阶段生成对抗网络的自注意力模型。
Neural Netw. 2021 Jun;138:57-67. doi: 10.1016/j.neunet.2021.01.023. Epub 2021 Feb 10.
2
Word self-update contrastive adversarial networks for text-to-image synthesis.基于词自更新对比对抗网络的文本到图像合成。
Neural Netw. 2023 Oct;167:433-444. doi: 10.1016/j.neunet.2023.08.038. Epub 2023 Aug 25.
3
Adversarial text-to-image synthesis: A review.对抗文本到图像合成:综述。
Neural Netw. 2021 Dec;144:187-209. doi: 10.1016/j.neunet.2021.07.019. Epub 2021 Aug 8.
4
DualG-GAN, a Dual-channel Generator based Generative Adversarial Network for text-to-face synthesis.基于双通道生成器的生成对抗网络 DualG-GAN 文本到人脸的合成。
Neural Netw. 2022 Nov;155:155-167. doi: 10.1016/j.neunet.2022.08.016. Epub 2022 Aug 19.
5
StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks.StackGAN++:基于堆叠生成对抗网络的逼真图像合成
IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):1947-1962. doi: 10.1109/TPAMI.2018.2856256. Epub 2018 Jul 16.
6
Image manipulation with natural language using Two-sided Attentive Conditional Generative Adversarial Network.使用双边注意条件生成对抗网络进行自然语言指导的图像操作。
Neural Netw. 2021 Apr;136:207-217. doi: 10.1016/j.neunet.2020.09.002. Epub 2020 Sep 12.
7
Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis.用于细粒度文本到图像合成的多句辅助对抗网络。
IEEE Trans Image Process. 2021;30:2798-2809. doi: 10.1109/TIP.2021.3055062. Epub 2021 Feb 12.
8
Semantic Object Accuracy for Generative Text-to-Image Synthesis.生成式文本到图像合成的语义对象准确性。
IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1552-1565. doi: 10.1109/TPAMI.2020.3021209. Epub 2022 Feb 3.
9
Image Generation from Text Using StackGAN with Improved Conditional Consistency Regularization.使用具有改进条件一致性正则化的StackGAN从文本生成图像
Sensors (Basel). 2022 Dec 26;23(1):249. doi: 10.3390/s23010249.
10
Generative Image Inpainting for Retinal Images using Generative Adversarial Networks.基于生成对抗网络的视网膜图像生成式修复。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2835-2838. doi: 10.1109/EMBC46164.2021.9630619.

引用本文的文献

1
GAN-FDSR: GAN-Based Fault Detection and System Reconfiguration Method.GAN-FDSR:基于生成对抗网络的故障检测与系统重构方法
Sensors (Basel). 2022 Jul 15;22(14):5313. doi: 10.3390/s22145313.
2
Rapid DNA origami nanostructure detection and classification using the YOLOv5 deep convolutional neural network.使用 YOLOv5 深度卷积神经网络进行快速 DNA 折纸纳米结构的检测和分类。
Sci Rep. 2022 Mar 9;12(1):3871. doi: 10.1038/s41598-022-07759-3.