Chen Peilin, Yang Wenhan, Wang Meng, Sun Long, Hu Kangkang, Wang Shiqi
IEEE Trans Image Process. 2021;30:7156-7169. doi: 10.1109/TIP.2021.3101826. Epub 2021 Aug 12.
Real-world video processing algorithms are often faced with the great challenges of processing the compressed videos instead of pristine videos. Despite the tremendous successes achieved in deep-learning based video super-resolution (SR), much less work has been dedicated to the SR of compressed videos. Herein, we propose a novel approach for compressed domain deep video SR by jointly leveraging the coding priors and deep priors. By exploiting the diverse and ready-made spatial and temporal coding priors (e.g., partition maps and motion vectors) extracted directly from the video bitstream in an effortless way, the video SR in the compressed domain allows us to accurately reconstruct the high resolution video with high flexibility and substantially economized computational complexity. More specifically, to incorporate the spatial coding prior, the Guided Spatial Feature Transform (GSFT) layer is proposed to modulate features of the prior with the guidance of the video information, making the prior features more fine-grained and content-adaptive. To incorporate the temporal coding prior, a guided soft alignment scheme is designed to generate local attention off-sets to compensate for decoded motion vectors. Our soft alignment scheme combines the merits of explicit and implicit motion modeling methods, rendering the alignment of features more effective for SR in terms of the computational complexity and robustness to inaccurate motion fields. Furthermore, to fully make use of the deep priors, the multi-scale fused features are generated from a scale-wise convolution reconstruction network for final SR video reconstruction. To promote the compressed domain video SR research, we build an extensive Compressed Videos with Coding Prior (CVCP) dataset, including compressed videos of diverse content and various coding priors extracted from the bitstream. Extensive experimental results show the effectiveness of coding priors in compressed domain video SR.
现实世界中的视频处理算法常常面临处理压缩视频而非原始视频的巨大挑战。尽管基于深度学习的视频超分辨率(SR)取得了巨大成功,但针对压缩视频超分辨率的研究却少得多。在此,我们提出了一种通过联合利用编码先验和深度先验来进行压缩域深度视频超分辨率的新方法。通过轻松利用直接从视频比特流中提取的多样且现成的空间和时间编码先验(例如,分区图和运动矢量),压缩域中的视频超分辨率使我们能够以高灵活性准确重建高分辨率视频,并大幅降低计算复杂度。更具体地说,为了纳入空间编码先验,我们提出了引导空间特征变换(GSFT)层,以在视频信息的引导下调制先验的特征,使先验特征更精细且适应内容。为了纳入时间编码先验,设计了一种引导软对齐方案来生成局部注意力偏移,以补偿解码后的运动矢量。我们的软对齐方案结合了显式和隐式运动建模方法的优点,在计算复杂度和对不准确运动场的鲁棒性方面,使特征对齐对于超分辨率更有效。此外,为了充分利用深度先验,从一个逐尺度卷积重建网络生成多尺度融合特征,用于最终的超分辨率视频重建。为了推动压缩域视频超分辨率研究,我们构建了一个广泛的带有编码先验的压缩视频(CVCP)数据集,包括具有不同内容和从比特流中提取的各种编码先验的压缩视频。大量实验结果表明了编码先验在压缩域视频超分辨率中的有效性。