• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VNVC:一种用于高效人机视觉的通用神经视频编码框架。

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4579-4596. doi: 10.1109/TPAMI.2024.3356548. Epub 2024 Jun 5.

DOI:10.1109/TPAMI.2024.3356548
PMID:38252583
Abstract

Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion compensation and motion estimation through feature-based temporal context mining and cross-domain motion encoder-decoder to compress the following frames. The intermediate feature is directly fed into video reconstruction, video enhancement, and video analysis networks to evaluate its effectiveness. The evaluation shows that our framework with the intermediate feature achieves high compression efficiency for video reconstruction and satisfactory task performances with lower complexities.

摘要

几乎所有的数字视频在传输前都被编码成紧凑的表示形式。这种紧凑的表示形式需要被解码回像素,然后才能被人类显示,并且 - 通常 - 在被机器视觉算法增强/分析之前。直观地说,直接增强/分析编码表示而不将其解码为像素更有效率。因此,我们提出了一种通用的神经视频编码 (VNVC) 框架,旨在学习紧凑的表示形式,以支持重建和直接增强/分析,从而对人类和机器视觉都具有通用性。我们的 VNVC 框架具有基于特征的压缩循环。在循环中,一帧被编码成紧凑的表示形式,并被解码为在执行重建之前获得的中间特征。该中间特征可通过基于特征的时间上下文挖掘和跨域运动编码器-解码器用作运动补偿和运动估计的参考,以压缩后续帧。中间特征直接输入到视频重建、视频增强和视频分析网络中,以评估其有效性。评估表明,我们的框架具有中间特征,可实现视频重建的高压缩效率,并且在较低复杂度下具有令人满意的任务性能。

相似文献

1
VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision.VNVC:一种用于高效人机视觉的通用神经视频编码框架。
IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):4579-4596. doi: 10.1109/TPAMI.2024.3356548. Epub 2024 Jun 5.
2
New architecture for MPEG video streaming system with backward playback support.具有向后播放支持的MPEG视频流系统的新架构。
IEEE Trans Image Process. 2007 Sep;16(9):2169-83. doi: 10.1109/tip.2007.902330.
3
Arbitrarily shaped motion prediction for depth video compression using arithmetic edge coding.使用算术边缘编码对深度视频压缩进行任意形状运动预测。
IEEE Trans Image Process. 2014 Nov;23(11):4696-708. doi: 10.1109/TIP.2014.2353817. Epub 2014 Aug 29.
4
Two-terminal video coding.双终端视频编码。
IEEE Trans Image Process. 2009 Mar;18(3):534-51. doi: 10.1109/TIP.2008.2010148.
5
3-D model-based frame interpolation for distributed video coding of static scenes.用于静态场景分布式视频编码的基于3D模型的帧插值
IEEE Trans Image Process. 2007 May;16(5):1246-57. doi: 10.1109/tip.2007.894272.
6
Rate distortion optimization for H.264 interframe coding: a general framework and algorithms.用于H.264帧间编码的率失真优化:通用框架与算法
IEEE Trans Image Process. 2007 Jul;16(7):1774-84. doi: 10.1109/tip.2007.896685.
7
PRISM: A video coding paradigm with motion estimation at the decoder.PRISM:一种在解码器处进行运动估计的视频编码范式。
IEEE Trans Image Process. 2007 Oct;16(10):2436-48. doi: 10.1109/tip.2007.904949.
8
SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations.SSSIC:基于学习的结构表示的语义到信号可扩展图像编码。
IEEE Trans Image Process. 2021;30:8939-8954. doi: 10.1109/TIP.2021.3121131. Epub 2021 Oct 29.
9
Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision.基于 StyleGAN 先验的可扩展人脸图像编码:面向人机协同视觉的压缩。
IEEE Trans Image Process. 2024;33:408-422. doi: 10.1109/TIP.2023.3343912. Epub 2023 Dec 29.
10
Estimation-theoretic approach to delayed decoding of predictively encoded video sequences.预测编码视频序列的延迟解码的估计理论方法。
IEEE Trans Image Process. 2013 Mar;22(3):1175-85. doi: 10.1109/TIP.2012.2227773. Epub 2012 Nov 16.