通过非局部注意力优化和改进的上下文建模实现端到端学习的图像压缩

End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling.

作者信息

Chen Tong, Liu Haojie, Ma Zhan, Shen Qiu, Cao Xun, Wang Yao

出版信息

IEEE Trans Image Process. 2021;30:3179-3191. doi: 10.1109/TIP.2021.3058615. Epub 2021 Feb 25.

DOI:10.1109/TIP.2021.3058615

Abstract

This article proposes an end-to-end learnt lossy image compression approach, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure with Non-Local Attention optimization and Improved Context modeling (NLAIC). Our NLAIC 1) embeds non-local network operations as non-linear transforms in both main and hyper coders for deriving respective latent features and hyperpriors by exploiting both local and global correlations, 2) applies attention mechanism to generate implicit masks that are used to weigh the features for adaptive bit allocation, and 3) implements the improved conditional entropy modeling of latent features using joint 3D convolutional neural network (CNN)-based autoregressive contexts and hyperpriors. Towards the practical application, additional enhancements are also introduced to speed up the computational processing (e.g., parallel 3D CNN-based context prediction), decrease the memory consumption (e.g., sparse non-local processing) and reduce the implementation complexity (e.g., a unified model for variable rates without re-training). The proposed model outperforms existing learnt and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, on both Kodak and Tecnick datasets with the state-of-the-art compression efficiency, for both PSNR and MS-SSIM quality measurements. We have made all materials publicly accessible at https://njuvision.github.io/NIC for reproducible research.

摘要

本文提出了一种端到端学习的有损图像压缩方法，该方法基于具有非局部注意力优化和改进上下文建模（NLAIC）的基于深度神经网络（DNN）的变分自编码器（VAE）结构构建。我们的NLAIC：1）将非局部网络操作作为非线性变换嵌入到主编码器和超编码器中，通过利用局部和全局相关性来推导各自的潜在特征和超先验；2）应用注意力机制生成隐式掩码，用于对特征进行加权以进行自适应比特分配；3）使用基于联合3D卷积神经网络（CNN）的自回归上下文和超先验对潜在特征实现改进的条件熵建模。为了实现实际应用，还引入了额外的增强措施来加速计算处理（例如，基于并行3D CNN的上下文预测）、减少内存消耗（例如，稀疏非局部处理）并降低实现复杂度（例如，无需重新训练的可变码率统一模型）。在柯达和泰尼克数据集上，对于PSNR和MS-SSIM质量度量，所提出的模型在压缩效率方面优于现有的学习型和传统（例如，BPG、JPEG2000、JPEG）图像压缩方法。我们已将所有材料公开提供在https://njuvision.github.io/NIC上，以供可重复研究使用。