PromptMagician：用于文本到图像创作的交互式提示工程。

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation.

作者信息

Feng Yingchaojie, Wang Xingbo, Wong Kam Kwai, Wang Sijia, Lu Yuhong, Zhu Minfeng, Wang Baicheng, Chen Wei

出版信息

IEEE Trans Vis Comput Graph. 2024 Jan;30(1):295-305. doi: 10.1109/TVCG.2023.3327168. Epub 2023 Dec 25.

DOI:10.1109/TVCG.2023.3327168

Abstract

Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.

摘要

生成式文本到图像模型因其能够根据自然语言提示生成高质量图像的强大能力而在公众中广受欢迎。然而，由于自然语言的复杂性和模糊性，为所需图像开发有效的提示可能具有挑战性。本研究提出了PromptMagician，这是一个视觉分析系统，可帮助用户探索图像结果并完善输入提示。我们系统的核心是一个提示推荐模型，该模型将用户提示作为输入，从DiffusionDB中检索相似的提示-图像对，并识别特殊（重要且相关）的提示关键词。为了促进交互式提示优化，PromptMagician为检索到的图像和推荐关键词的跨模态嵌入引入了多级可视化，并支持用户指定多个标准进行个性化探索。两个使用场景、一项用户研究和专家访谈证明了我们系统的有效性和可用性，表明它有助于提示工程并提高生成式文本到图像模型的创造力支持。

相似文献

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation.PromptMagician：用于文本到图像创作的交互式提示工程。

IEEE Trans Vis Comput Graph. 2024 Jan;30(1):295-305. doi: 10.1109/TVCG.2023.3327168. Epub 2023 Dec 25.

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL：用于医学视觉语言模型的多模态协作提示学习

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

PrompTHis: Visualizing the Process and Influence of Prompt Editing During Text-to-Image Creation.PromptTHis：可视化文本到图像创建过程中提示编辑的过程和影响。

IEEE Trans Vis Comput Graph. 2025 Sep;31(9):4547-4559. doi: 10.1109/TVCG.2024.3408255.

Towards Dataset-Scale and Feature-Oriented Evaluation of Text Summarization in Large Language Model Prompts.面向大语言模型提示中文本摘要的数据集规模和面向特征的评估

IEEE Trans Vis Comput Graph. 2025 Jan;31(1):481-491. doi: 10.1109/TVCG.2024.3456398. Epub 2024 Nov 25.

Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media.基于后得分的文本引导图像编辑，以在社交媒体上获得关注。

Sensors (Basel). 2024 Jan 31;24(3):921. doi: 10.3390/s24030921.

Let the Chart Spark: Embedding Semantic Context into Chart with Text-to-Image Generative Model.让图表焕发生机：使用文本到图像生成模型将语义上下文嵌入图表

IEEE Trans Vis Comput Graph. 2024 Jan;30(1):284-294. doi: 10.1109/TVCG.2023.3326913. Epub 2023 Dec 25.

A Semantic-Based Method for Visualizing Large Image Collections.一种基于语义的大型图像集可视化方法。

IEEE Trans Vis Comput Graph. 2019 Jul;25(7):2362-2377. doi: 10.1109/TVCG.2018.2835485. Epub 2018 May 15.

Animal Pose Estimation Based on Contrastive Learning with Dynamic Conditional Prompts.基于动态条件提示对比学习的动物姿态估计

Animals (Basel). 2024 Jun 7;14(12):1712. doi: 10.3390/ani14121712.

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models.用于通过大语言模型进行即席任务适配的交互式和可视化提示工程

IEEE Trans Vis Comput Graph. 2023 Jan;29(1):1146-1156. doi: 10.1109/TVCG.2022.3209479. Epub 2022 Dec 16.

IntentSearch: Capturing User Intention for One-Click Internet Image Search.意图搜索：实现一键式互联网图像搜索中的用户意图捕获。

IEEE Trans Pattern Anal Mach Intell. 2012 Jul;34(7):1342-53. doi: 10.1109/TPAMI.2011.242. Epub 2011 Dec 13.

引用本文的文献

Advancing Innovation in Medical Presentations: A Guide for Medical Educators to Use Images Generated With Artificial Intelligence.推进医学演示文稿中的创新：医学教育工作者使用人工智能生成图像的指南。

Cureus. 2024 Dec 2;16(12):e74978. doi: 10.7759/cureus.74978. eCollection 2024 Dec.

KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph Integration.KNowNEt：通过知识图谱集成从大型语言模型中引导健康信息检索。

IEEE Trans Vis Comput Graph. 2025 Jan;31(1):547-557. doi: 10.1109/TVCG.2024.3456364. Epub 2024 Dec 3.

Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning.用于高效图像探索和用户引导式图像字幕生成的视觉分析

IEEE Trans Vis Comput Graph. 2024 Jun;30(6):2875-2887. doi: 10.1109/TVCG.2024.3388514. Epub 2024 Jun 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PromptMagician：用于文本到图像创作的交互式提示工程。

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献