VNVC：一种用于高效人机视觉的通用神经视频编码框架。

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4579-4596. doi: 10.1109/TPAMI.2024.3356548. Epub 2024 Jun 5.

DOI:10.1109/TPAMI.2024.3356548

Abstract

Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion compensation and motion estimation through feature-based temporal context mining and cross-domain motion encoder-decoder to compress the following frames. The intermediate feature is directly fed into video reconstruction, video enhancement, and video analysis networks to evaluate its effectiveness. The evaluation shows that our framework with the intermediate feature achieves high compression efficiency for video reconstruction and satisfactory task performances with lower complexities.

摘要

几乎所有的数字视频在传输前都被编码成紧凑的表示形式。这种紧凑的表示形式需要被解码回像素，然后才能被人类显示，并且 - 通常 - 在被机器视觉算法增强/分析之前。直观地说，直接增强/分析编码表示而不将其解码为像素更有效率。因此，我们提出了一种通用的神经视频编码 (VNVC) 框架，旨在学习紧凑的表示形式，以支持重建和直接增强/分析，从而对人类和机器视觉都具有通用性。我们的 VNVC 框架具有基于特征的压缩循环。在循环中，一帧被编码成紧凑的表示形式，并被解码为在执行重建之前获得的中间特征。该中间特征可通过基于特征的时间上下文挖掘和跨域运动编码器-解码器用作运动补偿和运动估计的参考，以压缩后续帧。中间特征直接输入到视频重建、视频增强和视频分析网络中，以评估其有效性。评估表明，我们的框架具有中间特征，可实现视频重建的高压缩效率，并且在较低复杂度下具有令人满意的任务性能。

相似文献

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision.

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4579-4596. doi: 10.1109/TPAMI.2024.3356548. Epub 2024 Jun 5.

New architecture for MPEG video streaming system with backward playback support.

IEEE Trans Image Process. 2007 Sep;16(9):2169-83. doi: 10.1109/tip.2007.902330.

Arbitrarily shaped motion prediction for depth video compression using arithmetic edge coding.

IEEE Trans Image Process. 2014 Nov;23(11):4696-708. doi: 10.1109/TIP.2014.2353817. Epub 2014 Aug 29.

Two-terminal video coding.

IEEE Trans Image Process. 2009 Mar;18(3):534-51. doi: 10.1109/TIP.2008.2010148.

3-D model-based frame interpolation for distributed video coding of static scenes.

IEEE Trans Image Process. 2007 May;16(5):1246-57. doi: 10.1109/tip.2007.894272.

Rate distortion optimization for H.264 interframe coding: a general framework and algorithms.

IEEE Trans Image Process. 2007 Jul;16(7):1774-84. doi: 10.1109/tip.2007.896685.

PRISM: A video coding paradigm with motion estimation at the decoder.

IEEE Trans Image Process. 2007 Oct;16(10):2436-48. doi: 10.1109/tip.2007.904949.

SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations.

IEEE Trans Image Process. 2021;30:8939-8954. doi: 10.1109/TIP.2021.3121131. Epub 2021 Oct 29.

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision.

IEEE Trans Image Process. 2024;33:408-422. doi: 10.1109/TIP.2023.3343912. Epub 2023 Dec 29.

Estimation-theoretic approach to delayed decoding of predictively encoded video sequences.

IEEE Trans Image Process. 2013 Mar;22(3):1175-85. doi: 10.1109/TIP.2012.2227773. Epub 2012 Nov 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

VNVC：一种用于高效人机视觉的通用神经视频编码框架。

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献