言语-人物网络：姿势引导的多粒度语言到人物生成

Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation.

作者信息

Liu Deyin, Wu Lin, Zheng Feng, Liu Lingqiao, Wang Meng

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Nov;34(11):8589-8601. doi: 10.1109/TNNLS.2022.3151631. Epub 2023 Oct 27.

DOI:10.1109/TNNLS.2022.3151631

Abstract

Person image generation conditioned on natural language allows us to personalize image editing in a user-friendly manner. This fashion, however, involves different granularities of semantic relevance between texts and visual content. Given a sentence describing an unknown person, we propose a novel pose-guided multi-granularity attention architecture to synthesize the person image in an end-to-end manner. To determine what content to draw at a global outline, the sentence-level description and pose feature maps are incorporated into a U-Net architecture to generate a coarse person image. To further enhance the fine-grained details, we propose to draw the human body parts with highly correlated textual nouns and determine the spatial positions with respect to target pose points. Our model is premised on a conditional generative adversarial network (GAN) that translates language description into a realistic person image. The proposed model is coupled with two-stream discriminators: 1) text-relevant local discriminators to improve the fine-grained appearance by identifying the region-text correspondences at the finer manipulation and 2) a global full-body discriminator to regulate the generation via a pose-weighting feature selection. Extensive experiments conducted on benchmarks validate the superiority of our method for person image generation.

摘要

基于自然语言的人物图像生成使我们能够以用户友好的方式实现图像编辑的个性化。然而，这种方式涉及文本与视觉内容之间不同粒度的语义相关性。给定一个描述未知人物的句子，我们提出了一种新颖的姿态引导多粒度注意力架构，以端到端的方式合成人物图像。为了在全局轮廓上确定绘制什么内容，将句子级描述和姿态特征图纳入U-Net架构以生成粗略的人物图像。为了进一步增强细粒度细节，我们建议用高度相关的文本名词绘制人体部位，并根据目标姿态点确定空间位置。我们的模型基于条件生成对抗网络（GAN），该网络将语言描述转换为逼真的人物图像。所提出的模型与双流鉴别器相结合：1）与文本相关的局部鉴别器，通过在更精细的操作中识别区域与文本的对应关系来改善细粒度外观；2）全局全身鉴别器，通过姿态加权特征选择来调节生成。在基准上进行的大量实验验证了我们的人物图像生成方法的优越性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

言语-人物网络：姿势引导的多粒度语言到人物生成

Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation.

作者信息

出版信息

相似文献

言语-人物网络：姿势引导的多粒度语言到人物生成

Verbal-Person Nets: Pose-Guided Multi-Granularity Language-to-Person Generation.

作者信息

出版信息

相似文献