Wang Li, Li Xing, Tian Wei, Peng Jianhua, Chen Rui
School of Computer and Software, Nanjing Vocational University of Industry Technology, Nanjing, 210023, China.
College of Information Science and Technology, College of Artificial Intelligence, Nanjing Forestry University, Nanjing, 210037, China.
Sci Rep. 2024 May 21;14(1):11601. doi: 10.1038/s41598-024-62633-8.
The emergence of convolutional neural network (CNN) and transformer has recently facilitated significant advances in image super-resolution (SR) tasks. However, these networks commonly construct complex structures, having huge model parameters and high computational costs, to boost reconstruction performance. In addition, they do not consider the structural prior well, which is not conducive to high-quality image reconstruction. In this work, we devise a lightweight interactive feature inference network (IFIN), complementing the strengths of CNN and Transformer, for effective image SR reconstruction. Specifically, the interactive feature aggregation module (IFAM), implemented by structure-aware attention block (SAAB), Swin Transformer block (SWTB), and enhanced spatial adaptive block (ESAB), serves as the network backbone, progressively extracts more dedicated features to facilitate the reconstruction of high-frequency details in the image. SAAB adaptively recalibrates local salient structural information, and SWTB effectively captures rich global information. Further, ESAB synergetically complements local and global priors to ensure the consistent fusion of diverse features, achieving high-quality reconstruction of images. Comprehensive experiments reveal that our proposed networks attain state-of-the-art reconstruction accuracy on benchmark datasets while maintaining low computational demands. Our code and results are available at: https://github.com/wwaannggllii/IFIN .
卷积神经网络(CNN)和Transformer的出现,最近推动了图像超分辨率(SR)任务的重大进展。然而,这些网络通常构建复杂的结构,具有巨大的模型参数和高计算成本,以提高重建性能。此外,它们没有很好地考虑结构先验,这不利于高质量的图像重建。在这项工作中,我们设计了一种轻量级的交互式特征推理网络(IFIN),它结合了CNN和Transformer的优势,用于有效的图像SR重建。具体来说,由结构感知注意力块(SAAB)、Swin Transformer块(SWTB)和增强空间自适应块(ESAB)实现的交互式特征聚合模块(IFAM)作为网络主干,逐步提取更具针对性的特征,以促进图像中高频细节的重建。SAAB自适应地重新校准局部显著结构信息,SWTB有效地捕捉丰富的全局信息。此外,ESAB协同补充局部和全局先验,以确保不同特征的一致融合,实现图像的高质量重建。综合实验表明,我们提出的网络在基准数据集上达到了当前最优的重建精度,同时保持了较低的计算需求。我们的代码和结果可在以下网址获取:https://github.com/wwaannggllii/IFIN 。