视觉中的扩散模型：综述

Diffusion Models in Vision: A Survey.

作者信息

Croitoru Florinel-Alin, Hondru Vlad, Ionescu Radu Tudor, Shah Mubarak

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10850-10869. doi: 10.1109/TPAMI.2023.3261988. Epub 2023 Aug 7.

DOI:10.1109/TPAMI.2023.3261988

Abstract

Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e., low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.

摘要

去噪扩散模型是计算机视觉中一个新兴的热门话题，在生成建模领域展现出了卓越的成果。扩散模型是一种深度生成模型，它基于两个阶段，即前向扩散阶段和反向扩散阶段。在前向扩散阶段，输入数据通过添加高斯噪声在多个步骤中逐渐受到干扰。在反向阶段，一个模型的任务是通过学习逐步逆转扩散过程来恢复原始输入数据。尽管扩散模型存在已知的计算负担，即由于采样过程中涉及的步骤数量众多而导致速度较慢，但因其生成样本的质量和多样性而广受赞誉。在本次综述中，我们对应用于视觉领域的去噪扩散模型的文章进行了全面回顾，涵盖了该领域的理论和实践贡献。首先，我们识别并介绍了三种通用的扩散建模框架，它们分别基于去噪扩散概率模型、噪声条件得分网络和随机微分方程。我们进一步讨论了扩散模型与其他深度生成模型之间的关系，包括变分自编码器、生成对抗网络、基于能量的模型、自回归模型和归一化流。然后，我们介绍了应用于计算机视觉的扩散模型的多视角分类。最后，我们阐述了扩散模型当前的局限性，并展望了一些有趣的未来研究方向。

相似文献

Diffusion Models in Vision: A Survey.

IEEE Trans Pattern Anal Mach Intell. 2023 Sep;45(9):10850-10869. doi: 10.1109/TPAMI.2023.3261988. Epub 2023 Aug 7.

Diffusion models in medical imaging: A comprehensive survey.

Med Image Anal. 2023 Aug;88:102846. doi: 10.1016/j.media.2023.102846. Epub 2023 May 23.

Diffusion models in bioinformatics and computational biology.

Nat Rev Bioeng. 2024 Feb;2(2):136-154. doi: 10.1038/s44222-023-00114-9. Epub 2023 Oct 27.

Semi-Implicit Denoising Diffusion Models (SIDDMs).

Adv Neural Inf Process Syst. 2023 Dec;36:17383-17394. Epub 2024 May 30.

Counterfactual MRI Generation with Denoising Diffusion Models for Interpretable Alzheimer's Disease Effect Detection.

bioRxiv. 2024 Feb 8:2024.02.05.578983. doi: 10.1101/2024.02.05.578983.

Generative Quantum Machine Learning via Denoising Diffusion Probabilistic Models.

Phys Rev Lett. 2024 Mar 8;132(10):100602. doi: 10.1103/PhysRevLett.132.100602.

How Much Is Enough? A Study on Diffusion Times in Score-Based Generative Models.

Entropy (Basel). 2023 Apr 7;25(4):633. doi: 10.3390/e25040633.

Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models.

IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7327-7347. doi: 10.1109/TPAMI.2021.3116668. Epub 2022 Oct 4.

Sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective.

Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2311810121. doi: 10.1073/pnas.2311810121. Epub 2024 Jun 24.

Restoring Vision in Adverse Weather Conditions With Patch-Based Denoising Diffusion Models.

IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10346-10357. doi: 10.1109/TPAMI.2023.3238179. Epub 2023 Jun 30.

引用本文的文献

Multi-scale diffusion model for underwater image restoration and enhancement.

PLoS One. 2025 Sep 10;20(9):e0331465. doi: 10.1371/journal.pone.0331465. eCollection 2025.

DiffDesign: Controllable diffusion with meta prior for efficient interior design generation.

PLoS One. 2025 Sep 4;20(9):e0331240. doi: 10.1371/journal.pone.0331240. eCollection 2025.

RoadDiffBox: Automatic Road Distress Diagnosis through Controlled Image Generation and Semi-Supervised Learning.

Research (Wash D C). 2025 Aug 25;8:0833. doi: 10.34133/research.0833. eCollection 2025.

Generative Artificial Intelligence in the Metaverse Era: A Review on Models and Applications.

Research (Wash D C). 2025 Aug 19;8:0804. doi: 10.34133/research.0804. eCollection 2025.

Airport-FOD3S: A Three-Stage Detection-Driven Framework for Realistic Foreign Object Debris Synthesis.

Sensors (Basel). 2025 Jul 23;25(15):4565. doi: 10.3390/s25154565.

Enhancing Tip Detection by Pre-Training with Synthetic Data for Ultrasound-Guided Intervention.

Diagnostics (Basel). 2025 Jul 31;15(15):1926. doi: 10.3390/diagnostics15151926.

A generalizable diffusion framework for 3D low-dose and few-view cardiac SPECT imaging.

Med Image Anal. 2025 Jul 30;106:103729. doi: 10.1016/j.media.2025.103729.

A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis.

Front Med (Lausanne). 2025 Jul 16;12:1551894. doi: 10.3389/fmed.2025.1551894. eCollection 2025.

Causal disentanglement for single-cell representations and controllable counterfactual generation.

Nat Commun. 2025 Jul 23;16(1):6775. doi: 10.1038/s41467-025-62008-1.

3D-EDiffMG: 3D equivariant diffusion-driven molecular generation to accelerate drug discovery.

J Pharm Anal. 2025 Jun;15(6):101257. doi: 10.1016/j.jpha.2025.101257. Epub 2025 Mar 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

视觉中的扩散模型：综述

Diffusion Models in Vision: A Survey.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献