Weijie Huang, Detian Huang
School of Business, Huaqiao University, Quanzhou, 362021, Fujian Province, China.
College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian Province, China.
Sci Rep. 2025 Jul 1;15(1):20792. doi: 10.1038/s41598-025-07650-x.
Transformers have demonstrated remarkable success in image super-resolution (SR) owing to their powerful long-range dependency modeling capability. Although increasing the sliding window size of transformer-based models (e.g., SwinIR) can improve SR performance, this weakens the learning of the fine-level local features, resulting in blurry details in the reconstructed images. To address this limitation, we propose a local feature enhancement transformer for image super-resolution (LFESR) that benefits from global feature capture while enhancing local feature interaction. The basis of our LFESR is the local feature enhancement transformer (LFET), which achieves a balance between the spatial processing and channel configuration in self-attention. Our LFET contains neighborhood self-attention (NSA) and a ghost head, which can be easily applied to existing SR networks based on window self-attention. First, NSA utilizes the Hadamard operation to implement a third-order mapping to enhance local interaction, thus providing clues for high-quality image reconstruction. Next, the novel ghost head combines attention maps with static matrices to increase the channel capacity, thereby enhancing the inference capability of local features. Finally, ConvFFN is incorporated to further strengthen high-frequency detail information for reconstructed images. Extensive experiments were performed to validate the proposed LFESR, which significantly outperformed state-of-the-art methods in terms of both visual quality and quantitative metrics. Especially, the proposed LFESR exceeds SwinIR by 0.49 dB and 0.52 dB in PSNR metrics at a scaling factor of 4 on Urban100 and Manga109 datasets, respectively.
由于具有强大的长距离依赖建模能力,Transformer在图像超分辨率(SR)方面取得了显著成功。虽然增加基于Transformer的模型(如SwinIR)的滑动窗口大小可以提高超分辨率性能,但这会削弱对精细局部特征的学习,导致重建图像中的细节模糊。为了解决这一局限性,我们提出了一种用于图像超分辨率的局部特征增强Transformer(LFESR),它在增强局部特征交互的同时受益于全局特征捕获。我们的LFESR的基础是局部特征增强Transformer(LFET),它在自注意力的空间处理和通道配置之间实现了平衡。我们的LFET包含邻域自注意力(NSA)和一个幽灵头,可以很容易地应用于基于窗口自注意力的现有超分辨率网络。首先,NSA利用哈达玛运算实现三阶映射以增强局部交互,从而为高质量图像重建提供线索。其次,新颖的幽灵头将注意力图与静态矩阵相结合以增加通道容量,从而增强局部特征的推理能力。最后,引入ConvFFN以进一步增强重建图像的高频细节信息。进行了广泛的实验来验证所提出的LFESR,其在视觉质量和定量指标方面均显著优于现有方法。特别是,在Urban100和Manga109数据集上,所提出的LFESR在4倍缩放因子下的PSNR指标分别比SwinIR高出0.49 dB和0.52 dB。