IEEE Trans Image Process. 2024;33:408-422. doi: 10.1109/TIP.2023.3343912. Epub 2023 Dec 29.
The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigantic scale, which shall be effectively represented to satisfy both human and machine requirements. In this work, we investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers, supporting machine intelligence and human visual perception in a progressive fashion. With the aim of achieving efficient compression, we propose the layer-wise scalable entropy transformer to reduce the redundancy between layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio. We validate the proposed paradigm's feasibility in face image compression. Extensive qualitative and quantitative experimental results demonstrate the superiority of the proposed paradigm over the latest compression standard Versatile Video Coding (VVC) in terms of both machine analysis as well as human perception at extremely low bitrates (< 0.01 bpp), offering new insights for human-machine collaborative compression.
视觉内容的快速增长和机器视觉技术的飞速发展,给大规模传输视觉数据带来了巨大的挑战,这需要有效地进行表示,以满足人类和机器的需求。在这项工作中,我们研究了高级生成先验知识所得到的分层表示如何促进构建一种用于人机协作视觉的高效可扩展编码范例。我们的主要见解是,通过利用 StyleGAN 先验知识,我们可以学习三层表示,编码分层语义,这些表示精心设计为基本层、中间层和增强层,以渐进的方式支持机器智能和人类视觉感知。为了实现高效压缩,我们提出了分层可扩展熵转换器来减少层之间的冗余。基于多任务可扩展率失真目标,联合优化所提出的方案,以实现最佳的机器分析性能、人类感知体验和压缩比。我们在人脸图像压缩中验证了所提出范例的可行性。广泛的定性和定量实验结果表明,所提出的范例在机器分析和极低比特率(<0.01 bpp)下的人类感知方面都优于最新的压缩标准 Versatile Video Coding(VVC),为人机协作压缩提供了新的见解。