Suppr超能文献

NADM:用于山水画视频生成的噪声感知扩散模型。

NADM: Noise-Aware Diffusion Model for Landscape Painting Video Generation.

作者信息

Liu Ding-Ming, Li Shao-Wei, Zhou Ruo-Yan, Liang Li-Li, Hong Yong-Guan, Zeng Yuan-Ze, Chang Xiang, Li Li-Jiang, Xu Tian-Shuo, Chao Fei, Shang Changjing, Shen Qiang

出版信息

IEEE Trans Cybern. 2025 Aug;55(8):3686-3698. doi: 10.1109/TCYB.2025.3576752.

Abstract

Landscape painting is a gem of cultural and artistic heritage that showcases the splendor of nature through the deep observations and imaginations of its painters. Limited by traditional techniques, these artworks were confined to static imagery in ancient times, leaving the dynamism of landscapes and the subtleties of artistic sentiment to the viewer's imagination. Recently, emerging text-to-video (T2V) diffusion methods have shown significant promise in video generation, providing hope for the creation of dynamic landscape paintings. However, current T2V methods focus on generating natural videos, emphasizing the capture of details and the authenticity of physical laws. In contrast, landscape painting videos emphasize the overall dynamic aesthetic. Besides, challenges, such as the lack of specific datasets, the intricacy of artistic styles, and the creation of extensive, high-quality videos pose difficulties for these models in generating landscape painting videos. In this article, we propose landscape painting videos-high definition (LPV-HD), a novel T2V dataset for landscape painting videos, and noise-aware diffusion model (NADM), a T2V model that utilizes Stable Diffusion. Specifically, we present a motion module featuring a dual attention mechanism to capture the dynamic transformations of landscape imageries, alongside a noise adapter to leverage unsupervised contrastive learning in the latent space to ensure the overall beauty of the landscape painting video. Following the generation of keyframes, we employ optical flow for frame interpolation to enhance video smoothness. Our method not only retains the essence of the landscape painting imageries but also achieves dynamic transitions, significantly advancing the field of artistic video generation. Source code and dataset are available at https://github.com/llzlh21/NADM.

摘要

山水画是文化艺术遗产中的一颗瑰宝,它通过画家的深刻观察和想象展现自然的壮丽。受传统技术限制,这些艺术作品在古代仅限于静态图像,将风景的动态和艺术情感的微妙之处留给观众去想象。最近,新兴的文本到视频(T2V)扩散方法在视频生成方面显示出巨大潜力,为动态山水画的创作带来了希望。然而,当前的T2V方法侧重于生成自然视频,强调捕捉细节和物理规律的真实性。相比之下,山水画视频强调整体动态美感。此外,缺乏特定数据集、艺术风格的复杂性以及创建大量高质量视频等挑战,给这些模型生成山水画视频带来了困难。在本文中,我们提出了用于山水画视频的高清(LPV-HD)这一新颖的T2V数据集,以及利用Stable Diffusion的T2V模型——噪声感知扩散模型(NADM)。具体来说,我们提出了一个具有双重注意力机制的运动模块,以捕捉山水图像的动态变化,同时还有一个噪声适配器,用于在潜在空间中利用无监督对比学习来确保山水画视频的整体美感。在生成关键帧之后,我们采用光流进行帧插值以提高视频平滑度。我们的方法不仅保留了山水画图像的精髓,还实现了动态过渡,显著推动了艺术视频生成领域的发展。源代码和数据集可在https://github.com/llzlh21/NADM获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验