Li Zengyi, Chen Yubei, Sommer Friedrich T
Redwood Center for Theoretical Neuroscience, Berkeley, CA 94720, USA.
Department of Physics, University of California Berkeley, Berkeley, CA 94720, USA.
Entropy (Basel). 2023 Sep 22;25(10):1367. doi: 10.3390/e25101367.
Energy-based models (EBMs) assign an unnormalized log probability to data samples. This functionality has a variety of applications, such as sample synthesis, data denoising, sample restoration, outlier detection, Bayesian reasoning and many more. But, the training of EBMs using standard maximum likelihood is extremely slow because it requires sampling from the model distribution. Score matching potentially alleviates this problem. In particular, denoising-score matching has been successfully used to train EBMs. Using noisy data samples with one fixed noise level, these models learn fast and yield good results in data denoising. However, demonstrations of such models in the high-quality sample synthesis of high-dimensional data were lacking. Recently, a paper showed that a generative model trained by denoising-score matching accomplishes excellent sample synthesis when trained with data samples corrupted with multiple levels of noise. Here we provide an analysis and empirical evidence showing that training with multiple noise levels is necessary when the data dimension is high. Leveraging this insight, we propose a novel EBM trained with multiscale denoising-score matching. Our model exhibits a data-generation performance comparable to state-of-the-art techniques such as GANs and sets a new baseline for EBMs. The proposed model also provides density information and performs well on an image-inpainting task.
基于能量的模型(EBM)为数据样本分配一个未归一化的对数概率。这一功能有多种应用,如样本合成、数据去噪、样本恢复、异常检测、贝叶斯推理等等。但是,使用标准最大似然法训练EBM极其缓慢,因为它需要从模型分布中采样。得分匹配可能会缓解这个问题。特别是,去噪得分匹配已成功用于训练EBM。使用具有一个固定噪声水平的噪声数据样本,这些模型学习速度快,并且在数据去噪方面产生良好的结果。然而,在高维数据的高质量样本合成中缺乏此类模型的相关证明。最近,一篇论文表明,通过去噪得分匹配训练的生成模型在使用具有多个噪声水平的损坏数据样本进行训练时,能够实现出色的样本合成。在这里,我们提供了分析和实证证据,表明当数据维度较高时,使用多个噪声水平进行训练是必要的。利用这一见解,我们提出了一种通过多尺度去噪得分匹配训练的新型EBM。我们的模型展现出与生成对抗网络(GAN)等先进技术相当的数据生成性能,并为EBM设定了新的基线。所提出的模型还提供密度信息,并且在图像修复任务中表现良好。