Deveney Teo, Stanczuk Jan, Kreusser Lisa, Budd Chris, Schönlieb Carola-Bibiane
Department of Computer Science, University of Bath, Bath, UK.
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Cambridgeshire, UK.
Philos Trans A Math Phys Eng Sci. 2025 Jun 5;383(2298):20240503. doi: 10.1098/rsta.2024.0503.
Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to both their mathematical foundations and their state-of-the art performance in many tasks. Empirically, it has been reported that samplers based on ordinary differential equations (ODEs) are inferior to those based on stochastic differential equations (SDEs). In this article, we systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models and show how this relates to an associated Fokker-Planck equation. We rigorously describe the full range of dynamics and approximations arising when training score-based diffusion models and derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker-Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions that we demonstrate using explicit comparisons. Moreover, we show numerically that reducing this Fokker-Planck residual by adding it as an additional regularization term during training closes the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularization can improve the distribution generated by the ODE; however this can come at the cost of degraded SDE sample quality.This article is part of the theme issue 'Partial differential equations in data science'.
基于分数的扩散模型已成为深度生成建模中最有前途的框架之一,这得益于其数学基础以及在许多任务中的先进性能。根据经验,有报告称基于常微分方程(ODE)的采样器不如基于随机微分方程(SDE)的采样器。在本文中,我们系统地分析了基于分数的扩散模型的ODE和SDE动力学之间的差异,并展示了这与相关的福克 - 普朗克方程的关系。我们严格描述了训练基于分数的扩散模型时出现的完整动力学范围和近似情况,并根据福克 - 普朗克残差推导出ODE和SDE诱导分布之间瓦瑟斯坦2 - 距离的理论上限。我们还通过数值方法表明,传统的基于分数的扩散模型在ODE和SDE诱导分布之间可能表现出显著差异,我们通过显式比较来证明这一点。此外,我们通过数值方法表明,在训练期间将福克 - 普朗克残差作为额外的正则化项添加进来以减少它,可以缩小ODE和SDE诱导分布之间的差距。我们的实验表明,这种正则化可以改善由ODE生成的分布;然而,这可能以SDE样本质量下降为代价。本文是主题为“数据科学中的偏微分方程”的一部分。