Fu Haisheng, Liang Feng, Liang Jie, Wang Yongqiang, Fang Zhenman, Zhang Guohe, Han Jingning
IEEE Trans Image Process. 2024;33:4702-4715. doi: 10.1109/TIP.2024.3445737. Epub 2024 Aug 30.
Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we propose four techniques to balance the trade-off between the complexity and performance. We first introduce the deformable residual module to remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop a three-pass knowledge distillation scheme to retrain the decoder and entropy coding, and reduce the complexity of the core decoder network, which transfers both the final and intermediate results of the teacher network to the student network to improve its performance. Fourth, we introduce L regularization to make the numerical values of the latent representation more sparse, and we only encode non-zero channels in the encoding and decoding process to reduce the bit rate. This also reduces the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method achieves better rate-distortion performance than classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods, as measured by both PSNR and MS-SSIM metrics on the Kodak and Tecnick-40 datasets.
基于深度学习的图像压缩近年来取得了很大进展。然而,一些领先的方案使用串行上下文自适应熵模型来提高率失真(R-D)性能,这非常缓慢。此外,编码和解码网络的复杂度相当高,不适用于许多实际应用。在本文中,我们提出了四种技术来平衡复杂度和性能之间的权衡。我们首先引入可变形残差模块,以去除输入图像中的更多冗余,从而提高压缩性能。其次,我们设计了一种改进的棋盘上下文模型,它有两个独立的分布参数估计网络和不同的概率模型,与顺序上下文自适应模型相比,这使得并行解码在不牺牲性能的情况下成为可能。第三,我们开发了一种三通道知识蒸馏方案来重新训练解码器和熵编码,并降低核心解码器网络的复杂度,该方案将教师网络的最终结果和中间结果都传输到学生网络以提高其性能。第四,我们引入L正则化以使潜在表示的数值更稀疏,并且我们在编码和解码过程中只对非零通道进行编码以降低比特率。这也减少了编码和解码时间。实验表明,与当前最先进的学习图像编码方案相比,我们的方法在编码速度上可以快约20倍,在解码速度上可以快70 - 90倍,并且我们的R-D性能也高2.3%。在柯达和Tecnick-40数据集上,通过PSNR和MS-SSIM指标衡量,我们的方法比包括H.266/VVC-intra(4:4:4)在内的经典图像编解码器以及一些最近的学习方法实现了更好的率失真性能。