Qiu Zhongwei, Yang Huan, Fu Jianlong, Liu Daochang, Xu Chang, Fu Dongmei
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14888-14904. doi: 10.1109/TPAMI.2023.3312166. Epub 2023 Nov 3.
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges remain to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. This work proposes a novel degradation-robust Frequency-Transformer (FTVSR++) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a "divided attention" which conducts joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR++ outperforms state-of-the-art methods on different low-quality videos with clear visual margins.
视频超分辨率(VSR)旨在从低分辨率(LR)视频中恢复高分辨率(HR)视频。现有的VSR技术通常通过从具有已知退化过程的相邻帧中提取相关纹理来恢复HR帧。尽管取得了重大进展,但在从高度退化的低质量序列(如模糊、加性噪声和压缩伪像)中有效提取和传输高质量纹理方面,仍存在巨大挑战。这项工作提出了一种新颖的抗退化频率变换器(FTVSR++),用于处理在组合的时空频域中进行自注意力计算的低质量视频。首先,将视频帧分割成块,每个块被转换为频谱图,其中每个通道代表一个频带。这允许在每个频带上进行细粒度的自注意力计算,从而能够将真实的视觉纹理与伪像区分开来。其次,提出了一种新颖的双频注意力(DFA)机制来捕捉全局和局部频率关系,该机制能够处理现实场景中不同的复杂退化过程。第三,我们探索了频域中视频处理的不同自注意力方案,发现一种“分割注意力”方法,即在应用时间-频率注意力之前先进行联合空间-频率注意力计算,能够带来最佳的视频增强质量。在三个广泛使用的VSR数据集上进行的大量实验表明,FTVSR++在不同的低质量视频上优于现有方法,具有明显的视觉优势。