• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多模态图像合成与编辑:生成式人工智能时代

Multimodal Image Synthesis and Editing: The Generative AI Era.

作者信息

Zhan Fangneng, Yu Yingchen, Wu Rongliang, Zhang Jiahui, Lu Shijian, Liu Lingjie, Kortylewski Adam, Theobalt Christian, Xing Eric

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15098-15119. doi: 10.1109/TPAMI.2023.3305243. Epub 2023 Nov 3.

DOI:10.1109/TPAMI.2023.3305243
PMID:37624713
Abstract

As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modeling the interaction among multimodal information, multimodal image synthesis and editing has become a hot research topic in recent years. Instead of providing explicit guidance for network training, multimodal guidance offers intuitive and flexible means for image synthesis and editing. On the other hand, this field is also facing several challenges in alignment of multimodal features, synthesis of high-resolution images, faithful evaluation metrics, etc. In this survey, we comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modalities and model types. We start with an introduction to different guidance modalities in image synthesis and editing, and then describe multimodal image synthesis and editing approaches extensively according to their model types. After that, we describe benchmark datasets and evaluation metrics as well as corresponding experimental results. Finally, we provide insights about the current research challenges and possible directions for future research.

摘要

由于现实世界中的信息以多种模态存在,多模态信息之间的有效交互与融合在计算机视觉和深度学习研究中对于多模态数据的创建与感知起着关键作用。凭借在对多模态信息之间的交互进行建模方面的强大能力,多模态图像合成与编辑近年来已成为一个热门研究课题。多模态引导并非为网络训练提供明确指导,而是为图像合成与编辑提供直观且灵活的方式。另一方面,该领域在多模态特征对齐、高分辨率图像合成、可靠的评估指标等方面也面临若干挑战。在本综述中,我们全面梳理了近期多模态图像合成与编辑的进展,并根据数据模态和模型类型制定了分类法。我们首先介绍图像合成与编辑中的不同引导模态,然后根据模型类型广泛描述多模态图像合成与编辑方法。之后,我们描述基准数据集、评估指标以及相应的实验结果。最后,我们对当前的研究挑战以及未来研究的可能方向给出见解。

相似文献

1
Multimodal Image Synthesis and Editing: The Generative AI Era.多模态图像合成与编辑:生成式人工智能时代
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15098-15119. doi: 10.1109/TPAMI.2023.3305243. Epub 2023 Nov 3.
2
A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets.计算机视觉深度多模态学习综述:进展、趋势、应用及数据集
Vis Comput. 2022;38(8):2939-2970. doi: 10.1007/s00371-021-02166-7. Epub 2021 Jun 10.
3
FACEMUG: A Multimodal Generative and Fusion Framework for Local Facial Editing.FACEMUG:一种用于局部面部编辑的多模态生成与融合框架。
IEEE Trans Vis Comput Graph. 2024 Jul 26;PP. doi: 10.1109/TVCG.2024.3434386.
4
Adversarial text-to-image synthesis: A review.对抗文本到图像合成:综述。
Neural Netw. 2021 Dec;144:187-209. doi: 10.1016/j.neunet.2021.07.019. Epub 2021 Aug 8.
5
A layer-wise fusion network incorporating self-supervised learning for multimodal MR image synthesis.一种结合自监督学习的逐层融合网络用于多模态磁共振图像合成。
Front Genet. 2022 Aug 9;13:937042. doi: 10.3389/fgene.2022.937042. eCollection 2022.
6
Is image-to-image translation the panacea for multimodal image registration? A comparative study.图像到图像的翻译是否是多模态图像配准的万能药?一项对比研究。
PLoS One. 2022 Nov 28;17(11):e0276196. doi: 10.1371/journal.pone.0276196. eCollection 2022.
7
A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics.多模态医学图像融合综述:对医学模态、多模态数据库、融合技术和质量指标的简明分析。
Comput Biol Med. 2022 May;144:105253. doi: 10.1016/j.compbiomed.2022.105253. Epub 2022 Feb 3.
8
KDE-GAN: A multimodal medical image-fusion model based on knowledge distillation and explainable AI modules.KDE-GAN:基于知识蒸馏和可解释 AI 模块的多模态医学图像融合模型。
Comput Biol Med. 2022 Dec;151(Pt A):106273. doi: 10.1016/j.compbiomed.2022.106273. Epub 2022 Nov 3.
9
An Indirect Multimodal Image Registration and Completion Method Guided by Image Synthesis.基于图像合成的间接多模态图像配准与补全方法。
Comput Math Methods Med. 2020 Jun 30;2020:2684851. doi: 10.1155/2020/2684851. eCollection 2020.
10
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.多模态基准测试:用于多模态表示学习的多尺度基准测试
Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-20.

引用本文的文献

1
ESDiff: a joint model for low-quality retinal image enhancement and vessel segmentation using a diffusion model.ESDiff:一种使用扩散模型进行低质量视网膜图像增强和血管分割的联合模型。
Biomed Opt Express. 2023 Nov 29;14(12):6563-6578. doi: 10.1364/BOE.506205. eCollection 2023 Dec 1.
2
VSG-GAN: A high-fidelity image synthesis method with semantic manipulation in retinal fundus image.VSG-GAN:一种用于眼底图像语义操作的高保真图像合成方法。
Biophys J. 2024 Sep 3;123(17):2815-2829. doi: 10.1016/j.bpj.2024.02.019. Epub 2024 Feb 27.
3
Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation.
跨多种生物医学数据模式和队列的学习:创新面临的挑战与机遇
Patterns (N Y). 2024 Jan 17;5(2):100913. doi: 10.1016/j.patter.2023.100913. eCollection 2024 Feb 9.
4
Text-Guided Image Editing Based on Post Score for Gaining Attention on Social Media.基于后得分的文本引导图像编辑,以在社交媒体上获得关注。
Sensors (Basel). 2024 Jan 31;24(3):921. doi: 10.3390/s24030921.