Jia Chuanmin, Wang Shiqi, Zhang Xinfeng, Wang Shanshe, Liu Jiaying, Pu Shiliang, Ma Siwei
IEEE Trans Image Process. 2019 Jul;28(7):3343-3356. doi: 10.1109/TIP.2019.2896489. Epub 2019 Jan 31.
Recently, convolutional neural network (CNN) has attracted tremendous attention and has achieved great success in many image processing tasks. In this paper, we focus on CNN technology combined with image restoration to facilitate video coding performance and propose the content-aware CNN based in-loop filtering for high-efficiency video coding (HEVC). In particular, we quantitatively analyze the structure of the proposed CNN model from multiple dimensions to make the model interpretable and optimal for CNN-based loop filtering. More specifically, each coding tree unit (CTU) is treated as an independent region for processing, such that the proposed content-aware multimodel filtering mechanism is realized by the restoration of different regions with different CNN models under the guidance of the discriminative network. To adapt the image content, the discriminative neural network is learned to analyze the content characteristics of each region for the adaptive selection of the deep learning model. The CTU level control is also enabled in the sense of rate-distortion optimization. To learn the CNN model, an iterative training method is proposed by simultaneously labeling filter categories at the CTU level and fine-tuning the CNN model parameters. The CNN based in-loop filter is implemented after sample adaptive offset in HEVC, and extensive experiments show that the proposed approach significantly improves the coding performance and achieves up to 10.0% bit-rate reduction. On average, 4.1%, 6.0%, 4.7%, and 6.0% bit-rate reduction can be obtained under all intra, low delay, low delay P, and random access configurations, respectively.
最近,卷积神经网络(CNN)引起了极大关注,并在许多图像处理任务中取得了巨大成功。在本文中,我们专注于将CNN技术与图像恢复相结合以提升视频编码性能,并提出了基于内容感知CNN的帧内滤波用于高效视频编码(HEVC)。具体而言,我们从多个维度对所提出的CNN模型结构进行定量分析,以使模型具有可解释性并针对基于CNN的环路滤波实现最优。更具体地说,每个编码树单元(CTU)被视为一个独立的处理区域,这样所提出的内容感知多模型滤波机制通过在判别网络的引导下用不同的CNN模型恢复不同区域来实现。为了适应图像内容,学习判别神经网络以分析每个区域的内容特征,用于深度学习模型的自适应选择。在率失真优化的意义上也实现了CTU级控制。为了学习CNN模型,提出了一种迭代训练方法,通过在CTU级别同时标记滤波器类别并微调CNN模型参数。基于CNN的帧内滤波器在HEVC中的样本自适应偏移之后实现,大量实验表明所提出的方法显著提高了编码性能,实现了高达10.0%的比特率降低。平均而言,在所有帧内、低延迟、低延迟P和随机访问配置下,分别可获得4.1%、6.0%、4.7%和6.0%的比特率降低。