Feng Yingchaojie, Wang Xingbo, Wong Kam Kwai, Wang Sijia, Lu Yuhong, Zhu Minfeng, Wang Baicheng, Chen Wei
IEEE Trans Vis Comput Graph. 2024 Jan;30(1):295-305. doi: 10.1109/TVCG.2023.3327168. Epub 2023 Dec 25.
Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.
生成式文本到图像模型因其能够根据自然语言提示生成高质量图像的强大能力而在公众中广受欢迎。然而,由于自然语言的复杂性和模糊性,为所需图像开发有效的提示可能具有挑战性。本研究提出了PromptMagician,这是一个视觉分析系统,可帮助用户探索图像结果并完善输入提示。我们系统的核心是一个提示推荐模型,该模型将用户提示作为输入,从DiffusionDB中检索相似的提示-图像对,并识别特殊(重要且相关)的提示关键词。为了促进交互式提示优化,PromptMagician为检索到的图像和推荐关键词的跨模态嵌入引入了多级可视化,并支持用户指定多个标准进行个性化探索。两个使用场景、一项用户研究和专家访谈证明了我们系统的有效性和可用性,表明它有助于提示工程并提高生成式文本到图像模型的创造力支持。