Suppr超能文献

面向机器的视频编码:用于智能协作分析的紧凑视觉表示压缩

Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics.

作者信息

Yang Wenhan, Huang Haofeng, Hu Yueyu, Duan Ling-Yu, Liu Jiaying

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):5174-5191. doi: 10.1109/TPAMI.2024.3367293. Epub 2024 Jun 5.

Abstract

As an emerging research practice leveraging recent advanced AI techniques, e.g. deep models based prediction and generation, Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression, and attempts to optimize compactness and efficiency jointly from a unified perspective of high accuracy machine vision and full fidelity human vision. With the rapid advances of deep feature representation and visual data compression in mind, in this paper, we summarize VCM methodology and philosophy based on existing academia and industrial efforts. The development of VCM follows a general rate-distortion optimization, and the categorization of key modules or techniques is established including feature-assisted coding, scalable coding, intermediate feature compression/optimization, and machine vision targeted codec, from broader perspectives of vision tasks, analytics resources, etc. From previous works, it is demonstrated that, although existing works attempt to reveal the nature of scalable representation in bits when dealing with machine and human vision tasks, there remains a rare study in the generality of low bit rate representation, and accordingly how to support a variety of visual analytic tasks. Therefore, we investigate a novel visual information compression for the analytics taxonomy problem to strengthen the capability of compact visual representations extracted from multiple tasks for visual analytics. A new perspective of task relationships versus compression is revisited. By keeping in mind the transferability among different machine vision tasks (e.g. high-level semantic and mid-level geometry-related), we aim to support multiple tasks jointly at low bit rates. In particular, to narrow the dimensionality gap between neural network generated features extracted from pixels and a variety of machine vision features/labels (e.g. scene class, segmentation labels), a codebook hyperprior is designed to compress the neural network-generated features. As demonstrated in our experiments, this new hyperprior model is expected to improve feature compression efficiency by estimating the signal entropy more accurately, which enables further investigation of the granularity of abstracting compact features among different tasks.

摘要

作为一种利用近期先进人工智能技术(例如基于深度模型的预测和生成)的新兴研究实践,机器视频编码(VCM)致力于在一定程度上弥合视频/图像压缩和特征压缩这两个相对独立的研究方向,并尝试从高精度机器视觉和全保真人类视觉的统一视角共同优化紧凑性和效率。考虑到深度特征表示和视觉数据压缩的快速发展,在本文中,我们基于现有的学术和工业成果总结了VCM的方法和理念。VCM的发展遵循一般的率失真优化,并从视觉任务、分析资源等更广泛的视角建立了关键模块或技术的分类,包括特征辅助编码、可伸缩编码、中间特征压缩/优化以及面向机器视觉的编解码器。从先前的工作可以看出,尽管现有工作在处理机器和人类视觉任务时试图揭示可伸缩表示在比特方面的本质,但对于低比特率表示的一般性以及如何支持各种视觉分析任务的研究仍然很少。因此,我们针对分析分类问题研究了一种新颖的视觉信息压缩方法,以增强从多个任务中提取的紧凑视觉表示用于视觉分析的能力。重新审视了任务关系与压缩的新视角。考虑到不同机器视觉任务(例如高级语义和中级几何相关任务)之间的可迁移性,我们旨在以低比特率联合支持多个任务。特别是,为了缩小从像素中提取的神经网络生成特征与各种机器视觉特征/标签(例如场景类别、分割标签)之间的维度差距,设计了一个码本超先验来压缩神经网络生成的特征。如我们的实验所示,这种新的超先验模型有望通过更准确地估计信号熵来提高特征压缩效率,这使得能够进一步研究不同任务之间抽象紧凑特征的粒度。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验