Suppr超能文献

LGIT:用于低光照图像去噪的局部-全局交互变换器

LGIT: local-global interaction transformer for low-light image denoising.

作者信息

Chen Zuojun, Qin Pinle, Zeng Jianchao, Song Quanzhen, Zhao Pengcheng, Chai Rui

机构信息

School of Computer Science and Technology, North University of China, Taiyuan, 030051, China.

出版信息

Sci Rep. 2024 Sep 18;14(1):21760. doi: 10.1038/s41598-024-72912-z.

Abstract

Transformer-based methods effectively capture global dependencies in images, demonstrating outstanding performance in multiple visual tasks. However, existing Transformers cannot effectively denoise large noisy images captured under low-light conditions owing to (1) the global self-attention mechanism causing high computational complexity in the spatial dimension owing to a quadratic increase in computation with the number of tokens; (2) the channel-wise self-attention computation unable to optimise the spatial correlations in images. We propose a local-global interaction Transformer (LGIT) that employs an adaptive strategy to select relevant patches for global interaction, achieving low computational complexity in global self-attention computation. A top-N patch cross-attention model (TPCA) is designed based on superpixel segmentation guidance. TPCA selects top-N patches most similar to the target image patch and applies cross attention to aggregate information from them into the target patch, effectively enhancing the utilisation of the image's nonlocal self-similarity. A mixed-scale dual-gated feedforward network (MDGFF) is introduced for the effective extraction of multiscale local correlations. TPCA and MDGFF were combined to construct a hierarchical encoder-decoder network, LGIT, to compute self-attention within and across patches at different scales. Extensive experiments using real-world image-denoising datasets demonstrated that LGIT outperformed state-of-the-art (SOTA) convolutional neural network (CNN) and Transformer-based methods in qualitative and quantitative results.

摘要

基于Transformer的方法能够有效地捕捉图像中的全局依赖性,在多个视觉任务中表现出色。然而,现有的Transformer无法有效地对在低光照条件下拍摄的大尺寸噪声图像进行去噪,原因如下:(1)全局自注意力机制由于计算量随token数量呈二次方增长,导致在空间维度上计算复杂度较高;(2)通道维度的自注意力计算无法优化图像中的空间相关性。我们提出了一种局部-全局交互Transformer(LGIT),它采用自适应策略来选择用于全局交互的相关patch,在全局自注意力计算中实现了低计算复杂度。基于超像素分割引导设计了一种top-N patch交叉注意力模型(TPCA)。TPCA选择与目标图像patch最相似的top-N个patch,并应用交叉注意力将来自它们的信息聚合到目标patch中,有效地提高了图像非局部自相似性的利用率。引入了一种混合尺度双门控前馈网络(MDGFF)来有效提取多尺度局部相关性。将TPCA和MDGFF相结合,构建了一个分层编码器-解码器网络LGIT,以计算不同尺度下patch内部和跨patch的自注意力。使用真实世界图像去噪数据集进行的大量实验表明,LGIT在定性和定量结果方面均优于当前最先进的(SOTA)卷积神经网络(CNN)和基于Transformer的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/171f/11410926/3c9479ff2d8b/41598_2024_72912_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验