Yue Zijie, Shi Miaojing
College of Electronic and Information Engineering, Tongji University, China.
College of Electronic and Information Engineering, Tongji University, China; Shanghai Institute of Intelligent Science and Technology, Tongji University, China.
Neural Netw. 2025 Apr;184:107033. doi: 10.1016/j.neunet.2024.107033. Epub 2024 Dec 13.
The target of space-time video super-resolution (STVSR) is to increase both the frame rate (also referred to as the temporal resolution) and the spatial resolution of a given video. Recent approaches solve STVSR using end-to-end deep neural networks. A popular solution is to first increase the frame rate of the video; then perform feature refinement among different frame features; and at last, increase the spatial resolutions of these features. The temporal correlation among features of different frames is carefully exploited in this process. The spatial correlation among features of different (spatial) resolutions, despite being also very important, is however not emphasized. In this paper, we propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations among features of different frames and spatial resolutions. Specifically, the spatial-temporal frame interpolation module is introduced to interpolate low- and high-resolution intermediate frame features simultaneously and interactively. The spatial-temporal local and global refinement modules are respectively deployed afterwards to exploit the spatial-temporal correlation among different features for their refinement. Finally, a novel motion consistency loss is employed to enhance the motion continuity among reconstructed frames. We conduct experiments on three standard benchmarks, Vid4, Vimeo-90K and Adobe240, and the results demonstrate that our method improves the state-of-the-art methods by a considerable margin. Our codes will be available at https://github.com/yuezijie/STINet-Space-time-Video-Super-resolution.
时空视频超分辨率(STVSR)的目标是提高给定视频的帧率(也称为时间分辨率)和空间分辨率。最近的方法使用端到端深度神经网络来解决STVSR问题。一种流行的解决方案是首先提高视频的帧率;然后在不同帧特征之间进行特征细化;最后,提高这些特征的空间分辨率。在此过程中,会仔细利用不同帧特征之间的时间相关性。然而,不同(空间)分辨率特征之间的空间相关性虽然也非常重要,但并未得到强调。在本文中,我们提出了一种时空特征交互网络,通过利用不同帧和空间分辨率特征之间的空间和时间相关性来增强STVSR。具体来说,引入了时空帧插值模块,以同时且交互式地插值低分辨率和高分辨率中间帧特征。随后分别部署时空局部和全局细化模块,以利用不同特征之间的时空相关性进行细化。最后,采用一种新颖的运动一致性损失来增强重建帧之间的运动连续性。我们在三个标准基准Vid4、Vimeo-90K和Adobe240上进行了实验,结果表明我们的方法比现有方法有显著提升。我们的代码将在https://github.com/yuezijie/STINet-Space-time-Video-Super-resolution上提供。