基于词自更新对比对抗网络的文本到图像合成。

Word self-update contrastive adversarial networks for text-to-image synthesis.

机构信息

College of Information and Communication Engineering, Harbin Engineering University, 150001, Harbin, China.

Institute for Artificial Intelligence, Peking University, Beijing, 100871, China.

出版信息

Neural Netw. 2023 Oct;167:433-444. doi: 10.1016/j.neunet.2023.08.038. Epub 2023 Aug 25.

DOI:10.1016/j.neunet.2023.08.038

PMID:37673029

Abstract

Synthesizing realistic fine-grained images from text descriptions is a significant computer vision task. Although many GANs-based methods have been proposed to solve this task, generating high-quality images consistent with text information remains a difficult problem. These existing GANs-based methods ignore important words due to the use of fixed initial word features in generator, and neglect to learn semantic consistency between images and texts for discriminators. In this article, we propose a novel attentional generation and contrastive adversarial framework for fine-grained text-to-image synthesis, termed as Word Self-Update Contrastive Adversarial Networks (WSC-GAN). Specifically, we introduce a dual attention module for modeling color details and semantic information. With a new designed word self-update module, the generator can leverage visually important words to compute attention maps in the feature synthesis module. Furthermore, we contrive multi-branch contrastive discriminators to maintain better consistency between the generated image and text description. Two novel contrastive losses are proposed for our discriminators to impose image-sentence and image-word consistency constraints. Extensive experiments on CUB and MS-COCO datasets demonstrate that our method achieves better performance compared with state-of-the-art methods.

摘要

从文本描述中合成逼真的细粒度图像是一项重要的计算机视觉任务。尽管已经提出了许多基于 GAN 的方法来解决这个任务，但生成与文本信息一致的高质量图像仍然是一个难题。这些现有的基于 GAN 的方法由于在生成器中使用固定的初始单词特征，因此忽略了重要单词，并且忽略了学习图像和文本之间的语义一致性。在本文中，我们提出了一种新颖的注意生成和对比对抗框架，用于细粒度的文本到图像合成，称为单词自更新对比对抗网络（WSC-GAN）。具体来说，我们引入了双注意模块来建模颜色细节和语义信息。通过新设计的单词自更新模块，生成器可以利用视觉上重要的单词来计算特征合成模块中的注意力图。此外，我们设计了多分支对比鉴别器，以保持生成图像和文本描述之间更好的一致性。我们的鉴别器提出了两种新颖的对比损失，以施加图像-句子和图像-单词一致性约束。在 CUB 和 MS-COCO 数据集上的广泛实验表明，与最先进的方法相比，我们的方法取得了更好的性能。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于词自更新对比对抗网络的文本到图像合成。

Word self-update contrastive adversarial networks for text-to-image synthesis.

机构信息

出版信息

相似文献

基于词自更新对比对抗网络的文本到图像合成。

Word self-update contrastive adversarial networks for text-to-image synthesis.

机构信息

出版信息

相似文献