Zhang Mingjin, Bai Haichen, Shang Wenteng, Guo Jie, Li Yunsong, Gao Xinbo
IEEE Trans Neural Netw Learn Syst. 2025 Feb;36(2):2410-2422. doi: 10.1109/TNNLS.2024.3354982. Epub 2025 Feb 6.
Deep learning methods have achieved impressive performance in compressed video quality enhancement tasks. However, these methods rely excessively on practical experience by manually designing the network structure and do not fully exploit the potential of the feature information contained in the video sequences, i.e., not taking full advantage of the multiscale similarity of the compressed artifact information and not seriously considering the impact of the partition boundaries in the compressed video on the overall video quality. In this article, we propose a novel Mixed Difference Equation inspired Transformer (MDEformer) for compressed video quality enhancement, which provides a relatively reliable principle to guide the network design and yields a new insight into the interpretable transformer. Specifically, drawing on the graphical concept of the mixed difference equation (MDE), we utilize multiple cross-layer cross-attention aggregation (CCA) modules to establish long-range dependencies between encoders and decoders of the transformer, where partition boundary smoothing (PBS) modules are inserted as feedforward networks. The CCA module can make full use of the multiscale similarity of compression artifacts to effectively remove compression artifacts, and recover the texture and detail information of the frame. The PBS module leverages the sensitivity of smoothing convolution to partition boundaries to eliminate the impact of partition boundaries on the quality of compressed video and improve its overall quality, while not having too much impacts on non-boundary pixels. Extensive experiments on the MFQE 2.0 dataset demonstrate that the proposed MDEformer can eliminate compression artifacts for improving the quality of the compressed video, and surpasses the state-of-the-arts (SOTAs) in terms of both objective metrics and visual quality.
深度学习方法在压缩视频质量增强任务中取得了令人瞩目的性能。然而,这些方法过度依赖通过手动设计网络结构的实践经验,没有充分挖掘视频序列中包含的特征信息的潜力,即没有充分利用压缩伪像信息的多尺度相似性,也没有认真考虑压缩视频中的分区边界对整体视频质量的影响。在本文中,我们提出了一种用于压缩视频质量增强的新型混合差分方程启发式Transformer(MDEformer),它为指导网络设计提供了一个相对可靠的原则,并为可解释的Transformer带来了新的见解。具体来说,借鉴混合差分方程(MDE)的图形概念,我们利用多个跨层交叉注意力聚合(CCA)模块在Transformer的编码器和解码器之间建立长程依赖关系,其中插入分区边界平滑(PBS)模块作为前馈网络。CCA模块可以充分利用压缩伪像的多尺度相似性来有效去除压缩伪像,并恢复帧的纹理和细节信息。PBS模块利用平滑卷积对分区边界的敏感性来消除分区边界对压缩视频质量的影响并提高其整体质量,同时对非边界像素影响不大。在MFQE 2.0数据集上进行的大量实验表明,所提出的MDEformer可以消除压缩伪像以提高压缩视频的质量,并且在客观指标和视觉质量方面都超过了现有技术(SOTAs)。