Suppr超能文献

基于改进的棋盘上下文模型、可变形残差模块和知识蒸馏的快速高性能学习图像压缩

Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation.

作者信息

Fu Haisheng, Liang Feng, Liang Jie, Wang Yongqiang, Fang Zhenman, Zhang Guohe, Han Jingning

出版信息

IEEE Trans Image Process. 2024;33:4702-4715. doi: 10.1109/TIP.2024.3445737. Epub 2024 Aug 30.

Abstract

Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we propose four techniques to balance the trade-off between the complexity and performance. We first introduce the deformable residual module to remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop a three-pass knowledge distillation scheme to retrain the decoder and entropy coding, and reduce the complexity of the core decoder network, which transfers both the final and intermediate results of the teacher network to the student network to improve its performance. Fourth, we introduce L regularization to make the numerical values of the latent representation more sparse, and we only encode non-zero channels in the encoding and decoding process to reduce the bit rate. This also reduces the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method achieves better rate-distortion performance than classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods, as measured by both PSNR and MS-SSIM metrics on the Kodak and Tecnick-40 datasets.

摘要

基于深度学习的图像压缩近年来取得了很大进展。然而,一些领先的方案使用串行上下文自适应熵模型来提高率失真(R-D)性能,这非常缓慢。此外,编码和解码网络的复杂度相当高,不适用于许多实际应用。在本文中,我们提出了四种技术来平衡复杂度和性能之间的权衡。我们首先引入可变形残差模块,以去除输入图像中的更多冗余,从而提高压缩性能。其次,我们设计了一种改进的棋盘上下文模型,它有两个独立的分布参数估计网络和不同的概率模型,与顺序上下文自适应模型相比,这使得并行解码在不牺牲性能的情况下成为可能。第三,我们开发了一种三通道知识蒸馏方案来重新训练解码器和熵编码,并降低核心解码器网络的复杂度,该方案将教师网络的最终结果和中间结果都传输到学生网络以提高其性能。第四,我们引入L正则化以使潜在表示的数值更稀疏,并且我们在编码和解码过程中只对非零通道进行编码以降低比特率。这也减少了编码和解码时间。实验表明,与当前最先进的学习图像编码方案相比,我们的方法在编码速度上可以快约20倍,在解码速度上可以快70 - 90倍,并且我们的R-D性能也高2.3%。在柯达和Tecnick-40数据集上,通过PSNR和MS-SSIM指标衡量,我们的方法比包括H.266/VVC-intra(4:4:4)在内的经典图像编解码器以及一些最近的学习方法实现了更好的率失真性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验