Wang Juan, Duan Yiping, Tao Xiaoming, Xu Mai, Lu Jianhua
IEEE Trans Image Process. 2021;30:4225-4237. doi: 10.1109/TIP.2021.3065244. Epub 2021 Apr 12.
The existing image compression methods usually choose or optimize low-level representation manually. Actually, these methods struggle for the texture restoration at low bit rates. Recently, deep neural network (DNN)-based image compression methods have achieved impressive results. To achieve better perceptual quality, generative models are widely used, especially generative adversarial networks (GAN). However, training GAN is intractable, especially for high-resolution images, with the challenges of unconvincing reconstructions and unstable training. To overcome these problems, we propose a novel DNN-based image compression framework in this paper. The key point is decomposing an image into multi-scale sub-images using the proposed Laplacian pyramid based multi-scale networks. For each pyramid scale, we train a specific DNN to exploit the compressive representation. Meanwhile, each scale is optimized with different aspects, including pixel, semantics, distribution and entropy, for a good "rate-distortion-perception" trade-off. By independently optimizing each pyramid scale, we make each stage manageable and make each sub-image plausible. Experimental results demonstrate that our method achieves state-of-the-art performance, with advantages over existing methods in providing improved visual quality. Additionally, a better performance in the down-stream visual analysis tasks which are conducted on the reconstructed images, validates the excellent semantics-preserving ability of the proposed method.
现有的图像压缩方法通常手动选择或优化低级表示。实际上,这些方法在低比特率下难以进行纹理恢复。最近,基于深度神经网络(DNN)的图像压缩方法取得了令人瞩目的成果。为了获得更好的感知质量,生成模型被广泛使用,特别是生成对抗网络(GAN)。然而,训练GAN很棘手,尤其是对于高分辨率图像,存在重建效果不佳和训练不稳定的挑战。为了克服这些问题,我们在本文中提出了一种新颖的基于DNN的图像压缩框架。关键在于使用所提出的基于拉普拉斯金字塔的多尺度网络将图像分解为多尺度子图像。对于每个金字塔尺度,我们训练一个特定的DNN来利用压缩表示。同时,每个尺度在像素、语义、分布和熵等不同方面进行优化,以实现良好的“率失真感知”权衡。通过独立优化每个金字塔尺度,我们使每个阶段易于管理,并使每个子图像合理。实验结果表明,我们的方法实现了最优性能,在提供改进的视觉质量方面优于现有方法。此外,在对重建图像进行的下游视觉分析任务中表现更好,验证了所提出方法出色的语义保留能力。