• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VRT:一种视频恢复Transformer。

VRT: A Video Restoration Transformer.

作者信息

Liang Jingyun, Cao Jiezhang, Fan Yuchen, Zhang Kai, Ranjan Rakesh, Li Yawei, Timofte Radu, Van Gool Luc

出版信息

IEEE Trans Image Process. 2024;33:2171-2182. doi: 10.1109/TIP.2024.3372454. Epub 2024 Mar 22.

DOI:10.1109/TIP.2024.3372454
PMID:38451763
Abstract

Video restoration aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which are restricted by frame-by-frame restoration. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction ability. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal reciprocal self attention (TRSA) and parallel warping. TRSA divides the video into small clips, on which reciprocal attention is applied for joint motion estimation, feature alignment and feature fusion, while self attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on five tasks, including video super-resolution, video deblurring, video denoising, video frame interpolation and space-time video super-resolution, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16dB) on fourteen benchmark datasets. The codes are available at https://github.com/JingyunLiang/VRT.

摘要

视频恢复旨在从低质量帧中恢复高质量帧。与单图像恢复不同,视频恢复通常需要利用多个相邻但通常未对齐的视频帧中的时间信息。现有的深度方法通常通过采用滑动窗口策略或循环架构来解决此问题,这些方法受逐帧恢复的限制。在本文中,我们提出了一种具有并行帧预测能力的视频恢复Transformer(VRT)。具体而言,VRT由多个尺度组成,每个尺度由两种模块组成:时间互反自注意力(TRSA)和平行扭曲。TRSA将视频划分为小片段,在这些片段上应用互反注意力进行联合运动估计、特征对齐和特征融合,同时使用自注意力进行特征提取。为了实现跨片段交互,视频序列每隔一层进行移位。此外,并行扭曲用于通过并行特征扭曲进一步融合相邻帧的信息。在包括视频超分辨率、视频去模糊、视频去噪、视频帧插值和时空视频超分辨率在内的五个任务上的实验结果表明,VRT在14个基准数据集上大幅优于现有方法(高达2.16dB)。代码可在https://github.com/JingyunLiang/VRT获取。

相似文献

1
VRT: A Video Restoration Transformer.VRT:一种视频恢复Transformer。
IEEE Trans Image Process. 2024;33:2171-2182. doi: 10.1109/TIP.2024.3372454. Epub 2024 Mar 22.
2
STDAN: Deformable Attention Network for Space-Time Video Super-Resolution.STDAN:用于时空视频超分辨率的可变形注意力网络。
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):10606-10616. doi: 10.1109/TNNLS.2023.3243029. Epub 2024 Aug 5.
3
Multi-Stage Network for Event-Based Video Deblurring with Residual Hint Attention.基于残差提示注意力的多阶段事件视频去模糊网络。
Sensors (Basel). 2023 Mar 7;23(6):2880. doi: 10.3390/s23062880.
4
Multi-Stage Feature Fusion Network for Video Super-Resolution.用于视频超分辨率的多阶段特征融合网络
IEEE Trans Image Process. 2021;30:2923-2934. doi: 10.1109/TIP.2021.3056868. Epub 2021 Feb 12.
5
TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation.TTVFI:用于视频帧插值的学习轨迹感知Transformer
IEEE Trans Image Process. 2023;32:4728-4741. doi: 10.1109/TIP.2023.3302990. Epub 2023 Aug 22.
6
DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video.DSTAN:一种具有双向序列特征细化的可变形时空注意力网络,用于去除甲状腺超声视频中的斑点噪声。
J Imaging Inform Med. 2024 Dec;37(6):3264-3281. doi: 10.1007/s10278-023-00935-5. Epub 2024 Jun 5.
7
Video Frame Interpolation and Enhancement via Pyramid Recurrent Framework.基于金字塔循环框架的视频帧插值与增强
IEEE Trans Image Process. 2021;30:277-292. doi: 10.1109/TIP.2020.3033617. Epub 2020 Nov 20.
8
JNMR: Joint Non-Linear Motion Regression for Video Frame Interpolation.JNMR:用于视频帧插值的联合非线性运动回归
IEEE Trans Image Process. 2023;32:5283-5295. doi: 10.1109/TIP.2023.3315122. Epub 2023 Sep 22.
9
Joint Video Super-Resolution and Frame Interpolation via Permutation Invariance.基于排列不变性的视频联合超分辨率和帧插值。
Sensors (Basel). 2023 Feb 24;23(5):2529. doi: 10.3390/s23052529.
10
Video Summarization With Spatiotemporal Vision Transformer.基于时空视觉Transformer 的视频摘要
IEEE Trans Image Process. 2023;32:3013-3026. doi: 10.1109/TIP.2023.3275069. Epub 2023 May 26.

引用本文的文献

1
Local feature enhancement transformer for image super-resolution.用于图像超分辨率的局部特征增强Transformer
Sci Rep. 2025 Jul 1;15(1):20792. doi: 10.1038/s41598-025-07650-x.
2
Low-Light Image and Video Enhancement for More Robust Computer Vision Tasks: A Review.用于更强大计算机视觉任务的低光图像和视频增强:综述
J Imaging. 2025 Apr 21;11(4):125. doi: 10.3390/jimaging11040125.
3
SVTSR: image super-resolution using scattering vision transformer.SVTSR:使用散射视觉Transformer的图像超分辨率
Sci Rep. 2024 Dec 30;14(1):31770. doi: 10.1038/s41598-024-82650-x.
4
Tagged-to-Cine MRI Sequence Synthesis via Light Spatial-Temporal Transformer.通过轻量级时空变换器实现标记到电影MRI序列合成
Med Image Comput Comput Assist Interv. 2024 Oct;15007:701-711. doi: 10.1007/978-3-031-72104-5_67. Epub 2024 Oct 3.
5
Single-image super-resolution reconstruction based on phase-aware visual multi-layer perceptron (MLP).基于相位感知视觉多层感知器(MLP)的单图像超分辨率重建
PeerJ Comput Sci. 2024 Jul 19;10:e2208. doi: 10.7717/peerj-cs.2208. eCollection 2024.
6
Video reconstruction from a single motion blurred image using learned dynamic phase coding.利用学习到的动态相位编码从单幅运动模糊图像进行视频重建。
Sci Rep. 2023 Aug 21;13(1):13625. doi: 10.1038/s41598-023-40297-0.