Tan Hongchen, Liu Xiuping, Liu Meng, Yin Baocai, Li Xin
IEEE Trans Image Process. 2021;30:1275-1290. doi: 10.1109/TIP.2020.3026728. Epub 2020 Dec 23.
This paper presents a new framework, Knowledge-Transfer Generative Adversarial Network (KT-GAN), for fine-grained text-to-image generation. We introduce two novel mechanisms: an Alternate Attention-Transfer Mechanism (AATM) and a Semantic Distillation Mechanism (SDM), to help generator better bridge the cross-domain gap between text and image. The AATM updates word attention weights and attention weights of image sub-regions alternately, to progressively highlight important word information and enrich details of synthesized images. The SDM uses the image encoder trained in the Image-to-Image task to guide training of the text encoder in the Text-to-Image task, for generating better text features and higher-quality images. With extensive experimental validation on two public datasets, our KT-GAN outperforms the baseline method significantly, and also achieves the competive results over different evaluation metrics.
本文提出了一种用于细粒度文本到图像生成的新框架——知识转移生成对抗网络(KT-GAN)。我们引入了两种新颖的机制:交替注意力转移机制(AATM)和语义蒸馏机制(SDM),以帮助生成器更好地弥合文本与图像之间的跨域差距。AATM交替更新单词注意力权重和图像子区域的注意力权重,以逐步突出重要单词信息并丰富合成图像的细节。SDM使用在图像到图像任务中训练的图像编码器来指导文本到图像任务中的文本编码器训练,以生成更好的文本特征和更高质量的图像。通过在两个公共数据集上进行广泛的实验验证,我们的KT-GAN显著优于基线方法,并且在不同评估指标上也取得了有竞争力的结果。