SSSIC：基于学习的结构表示的语义到信号可扩展图像编码。

SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations.

出版信息

IEEE Trans Image Process. 2021;30:8939-8954. doi: 10.1109/TIP.2021.3121131. Epub 2021 Oct 29.

DOI:10.1109/TIP.2021.3121131

Abstract

We address the requirement of image coding for joint human-machine vision, i.e., the decoded image serves both human observation and machine analysis/understanding. Previously, human vision and machine vision have been extensively studied by image (signal) compression and (image) feature compression, respectively. Recently, for joint human-machine vision, several studies have been devoted to joint compression of images and features, but the correlation between images and features is still unclear. We identify the deep network as a powerful toolkit for generating structural image representations. From the perspective of information theory, the deep features of an image naturally form an entropy decreasing series: a scalable bitstream is achieved by compressing the features backward from a deeper layer to a shallower layer until culminating with the image signal. Moreover, we can obtain learned representations by training the deep network for a given semantic analysis task or multiple tasks and acquire deep features that are related to semantics. With the learned structural representations, we propose SSSIC, a framework to obtain an embedded bitstream that can be either partially decoded for semantic analysis or fully decoded for human vision. We implement an exemplar SSSIC scheme using coarse-to-fine image classification as the driven semantic analysis task. We also extend the scheme for object detection and instance segmentation tasks. The experimental results demonstrate the effectiveness of the proposed SSSIC framework and establish that the exemplar scheme achieves higher compression efficiency than separate compression of images and features.

摘要

我们解决了联合人机视觉的图像编码要求，即解码后的图像既服务于人类观察，也服务于机器分析/理解。以前，人类视觉和机器视觉分别通过图像（信号）压缩和（图像）特征压缩得到了广泛的研究。最近，针对联合人机视觉，已经有几项研究致力于图像和特征的联合压缩，但图像和特征之间的相关性仍不清楚。我们将深度网络识别为生成结构图像表示的强大工具包。从信息论的角度来看，图像的深度特征自然形成一个熵递减序列：通过从较深的层到较浅的层压缩特征，直到最终达到图像信号，就可以实现可扩展的比特流。此外，我们可以通过为给定的语义分析任务或多个任务训练深度网络来获得学习表示，并获得与语义相关的深度特征。利用学习到的结构表示，我们提出了 SSSIC，这是一种获得嵌入式比特流的框架，该比特流可以部分解码用于语义分析，也可以完全解码用于人类视觉。我们使用从粗到细的图像分类作为驱动的语义分析任务来实现一个示例 SSSIC 方案。我们还扩展了该方案以用于对象检测和实例分割任务。实验结果证明了所提出的 SSSIC 框架的有效性，并表明示例方案比图像和特征的单独压缩实现了更高的压缩效率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

SSSIC：基于学习的结构表示的语义到信号可扩展图像编码。

SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations.

出版信息

相似文献

引用本文的文献

SSSIC：基于学习的结构表示的语义到信号可扩展图像编码。

SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations.

出版信息

相似文献

引用本文的文献