• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于时空协同注意力Transformer的无监督低光照视频增强

Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer.

作者信息

Lv Xiaoqian, Zhang Shengping, Wang Chenyang, Zhang Weigang, Yao Hongxun, Huang Qingming

出版信息

IEEE Trans Image Process. 2023;32:4701-4715. doi: 10.1109/TIP.2023.3301332. Epub 2023 Aug 16.

DOI:10.1109/TIP.2023.3301332
PMID:37549080
Abstract

Existing low-light video enhancement methods are dominated by Convolution Neural Networks (CNNs) that are trained in a supervised manner. Due to the difficulty of collecting paired dynamic low/normal-light videos in real-world scenes, they are usually trained on synthetic, static, and uniform motion videos, which undermines their generalization to real-world scenes. Additionally, these methods typically suffer from temporal inconsistency (e.g., flickering artifacts and motion blurs) when handling large-scale motions since the local perception property of CNNs limits them to model long-range dependencies in both spatial and temporal domains. To address these problems, we propose the first unsupervised method for low-light video enhancement to our best knowledge, named LightenFormer, which models long-range intra- and inter-frame dependencies with a spatial-temporal co-attention transformer to enhance brightness while maintaining temporal consistency. Specifically, an effective but lightweight S-curve Estimation Network (SCENet) is first proposed to estimate pixel-wise S-shaped non-linear curves (S-curves) to adaptively adjust the dynamic range of an input video. Next, to model the temporal consistency of the video, we present a Spatial-Temporal Refinement Network (STRNet) to refine the enhanced video. The core module of STRNet is a novel Spatial-Temporal Co-attention Transformer (STCAT), which exploits multi-scale self- and cross-attention interactions to capture long-range correlations in both spatial and temporal domains among frames for implicit motion estimation. To achieve unsupervised training, we further propose two non-reference loss functions based on the invertibility of the S-curve and the noise independence among frames. Extensive experiments on the SDSD and LLIV-Phone datasets demonstrate that our LightenFormer outperforms state-of-the-art methods.

摘要

现有的低光视频增强方法主要由以监督方式训练的卷积神经网络(CNN)主导。由于在现实场景中收集成对的动态低光/正常光视频存在困难,这些方法通常在合成、静态和均匀运动视频上进行训练,这削弱了它们对现实场景的泛化能力。此外,由于CNN的局部感知特性限制了它们在空间和时间域中对长距离依赖关系进行建模,这些方法在处理大规模运动时通常会出现时间不一致性(例如闪烁伪像和运动模糊)。为了解决这些问题,据我们所知,我们提出了第一种用于低光视频增强的无监督方法,名为LightenFormer,它使用时空协同注意力变换器对帧内和帧间的长距离依赖关系进行建模,以增强亮度并保持时间一致性。具体来说,首先提出了一种有效但轻量级的S曲线估计网络(SCENet)来估计逐像素的S形非线性曲线(S曲线),以自适应调整输入视频的动态范围。接下来,为了对视频的时间一致性进行建模,我们提出了一个时空细化网络(STRNet)来细化增强后的视频。STRNet的核心模块是一个新颖的时空协同注意力变换器(STCAT),它利用多尺度自注意力和交叉注意力交互来捕捉帧之间在空间和时间域中的长距离相关性,用于隐式运动估计。为了实现无监督训练,我们进一步基于S曲线的可逆性和帧之间的噪声独立性提出了两个非参考损失函数。在SDSD和LLIV-Phone数据集上进行的大量实验表明,我们的LightenFormer优于现有方法。

相似文献

1
Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer.基于时空协同注意力Transformer的无监督低光照视频增强
IEEE Trans Image Process. 2023;32:4701-4715. doi: 10.1109/TIP.2023.3301332. Epub 2023 Aug 16.
2
DSTAN: A Deformable Spatial-temporal Attention Network with Bidirectional Sequence Feature Refinement for Speckle Noise Removal in Thyroid Ultrasound Video.DSTAN:一种具有双向序列特征细化的可变形时空注意力网络,用于去除甲状腺超声视频中的斑点噪声。
J Imaging Inform Med. 2024 Dec;37(6):3264-3281. doi: 10.1007/s10278-023-00935-5. Epub 2024 Jun 5.
3
DyGraphformer: Transformer combining dynamic spatio-temporal graph network for multivariate time series forecasting.DyGraphformer:结合动态时空图网络的Transformer用于多变量时间序列预测。
Neural Netw. 2025 Jan;181:106776. doi: 10.1016/j.neunet.2024.106776. Epub 2024 Oct 17.
4
An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition.一种用于动作识别的具有同步时空和空间自注意力的高效视频变换器。
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2496-2509. doi: 10.1109/TNNLS.2022.3190367. Epub 2024 Feb 5.
5
Video Summarization With Spatiotemporal Vision Transformer.基于时空视觉Transformer 的视频摘要
IEEE Trans Image Process. 2023;32:3013-3026. doi: 10.1109/TIP.2023.3275069. Epub 2023 May 26.
6
TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning.TCGL:用于自监督视频表征学习的时间对比图
IEEE Trans Image Process. 2022;31:1978-1993. doi: 10.1109/TIP.2022.3147032. Epub 2022 Feb 18.
7
Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-Local Spatial-Temporal Similarity.基于时域锐度先验和非局部时空相似性的级联深度视频去模糊。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9411-9425. doi: 10.1109/TPAMI.2023.3243059. Epub 2023 Jun 30.
8
Deeply Coupled Convolution-Transformer With Spatial-Temporal Complementary Learning for Video-Based Person Re-Identification.基于时空互补学习的深度耦合卷积-Transformer用于视频人物重识别
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13753-13763. doi: 10.1109/TNNLS.2023.3271353. Epub 2024 Oct 7.
9
Video Crowd Localization With Multifocus Gaussian Neighborhood Attention and a Large-Scale Benchmark.基于多焦点高斯邻域注意力和大规模基准的视频人群定位
IEEE Trans Image Process. 2022;31:6032-6047. doi: 10.1109/TIP.2022.3205210. Epub 2022 Sep 19.
10
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.