Wang Weilun, Bao Jianmin, Zhou Wengang, Chen Dongdong, Chen Dong, Yuan Lu, Li Houqiang
IEEE Trans Pattern Anal Mach Intell. 2025 May;47(5):3412-3423. doi: 10.1109/TPAMI.2025.3532956. Epub 2025 Apr 8.
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. The default approach of previous GAN-based methods on this problem is to train multiple models at progressive growing scales, which leads to the accumulation of errors and causes characteristic artifacts in generated results. In this paper, we uncover that multiple models at progressive growing scales are not essential for learning from a single image and propose SinDiffusion, a single diffusion-based model trained on a single scale, which is better-suited for this task. Furthermore, we identify that a patch-level receptive field is crucial and effective for diffusion models to capture the image's patch statistics, therefore we redesign an patch-wise denoising network for SinDiffusion. Coupling these two designs enables SinDiffusion to generate more photorealistic and diverse images from a single image compared with GAN-based approaches. SinDiffusion can also be applied to various applications, i.e., text-guided image generation, and image outpainting beyond the capability of SinGAN. Extensive experiments on a wide range of images demonstrate the superiority of SinDiffusion for modeling the patch distribution.