Liang Jiabao, Jin Yutao, Chen Xiaoyan, Huang Haotian, Deng Yue
School of Electronic Information and Automation, Tianjin, China.
Sci Rep. 2024 Dec 30;14(1):31770. doi: 10.1038/s41598-024-82650-x.
Vision transformers have garnered substantial attention and attained impressive performance in image super-resolution tasks. Nevertheless, these networks face challenges associated with attention complexity and the effective capture of intricate, fine-grained details within images. These hurdles impede the efficient and scalable deployment of transformer models for image super-resolution tasks in real-world applications. In this paper, we present a novel vision transformer called Scattering Vision Transformer for Super-Resolution (SVTSR) to tackle these challenges. SVTSR integrates a spectrally scattering network to efficiently capture intricate image details. It addresses the invertibility problem commonly encountered in down-sampling operations by separating low-frequency and high-frequency components. Additionally, SVTSR introduces a novel spectral gating network that utilizes Einstein multiplication for token and channel mixing, effectively reducing complexity. Extensive experiments show the effectiveness of the proposed vision transformer for image super-resolution tasks. Our comprehensive methodology not only outperforms state-of-the-art methods in terms of the PSNR and SSIM metrics but, more significantly, entails a reduction in model parameters exceeding tenfold when compared to the baseline model. As shown in Fig. 1, the substantial decrease of parameter amount proves highly advantageous for the deployment and practical application of super-resolution models. Code is available at https://github.com/LiangJiabaoY/SVTSR.git.
视觉Transformer在图像超分辨率任务中受到了广泛关注并取得了令人瞩目的性能。然而,这些网络面临着与注意力复杂度以及有效捕捉图像中复杂的细粒度细节相关的挑战。这些障碍阻碍了Transformer模型在实际应用中高效且可扩展地部署于图像超分辨率任务。在本文中,我们提出了一种名为用于超分辨率的散射视觉Transformer(SVTSR)的新型视觉Transformer来应对这些挑战。SVTSR集成了一个频谱散射网络以有效捕捉复杂的图像细节。它通过分离低频和高频分量解决了下采样操作中常见的可逆性问题。此外,SVTSR引入了一种新颖的频谱门控网络,该网络利用爱因斯坦乘法进行令牌和通道混合,有效降低了复杂度。大量实验表明了所提出的视觉Transformer在图像超分辨率任务中的有效性。我们的综合方法不仅在PSNR和SSIM指标方面优于现有方法,更重要的是,与基线模型相比,模型参数减少了超过十倍。如图1所示,参数数量的大幅减少对于超分辨率模型的部署和实际应用非常有利。代码可在https://github.com/LiangJiabaoY/SVTSR.git获取。