Cai Pengfei, Li Biyuan, Ma Jinying, Tian Xiao, Huo Lianhao, Jia Xuefeng
School of Electronic Engineering, Tianjin University of Technology and Education, Tianjin, 300222, People's Republic of China.
Tianjin Development Zone Jingnuohanhai Data Technology Co., Ltd, People's Republic of China.
Biomed Phys Eng Express. 2025 Aug 5;11(5). doi: 10.1088/2057-1976/adf3b5.
Low-quality fundus images pose significant challenges for diabetic retinopathy (DR) classification due to noise, blurred boundaries, and the loss of high-frequency details, which hinder both global contextual understanding and local fine-grained feature extraction. To address these limitations, this work proposes a Hierarchical Frequency-Spatial Feature Fusion Network (HFSF-Net) that effectively integrates frequency-domain and spatial-domain information for robust DR classification. Initially, this paper introduces the Spatial Prior-Aware Transformer (SPAT) Block, which incorporates spatial prior knowledge to direct the attention distribution, enabling precise localization of the complex distribution of lesion regions in low-quality fundus images. Subsequently, a novel Wavelet-Enhanced Self-Attention (WESA) module is developed, which utilizes wavelet transforms to extract and enhance high-frequency components such as microvascular textures and edges. Based on WESA, the Wavelet-Enhanced Transformer (WET) Block is constructed to strengthen the ability to recover local details in degraded images. Furthermore, a Hierarchical Frequency-Spatial Fusion (HFSF) module is designed to hierarchically integrate multi-scale features, mitigating information redundancy and resolving feature conflicts between domains. Through this architecture, the model achieves a balanced representation of global and local information. The experiments conducted on the APTOS and DDR datasets yield ACC values of 0.8117 and 0.8021, and Kappa scores of 0.7158 and 0.6731, respectively. Although the model does not achieve exceptionally high Recall, its consistently strong performance in other key metrics supports the claim that the proposed architecture enables a balanced representation of global and local information. Furthermore, the experimental results validate the effectiveness and robustness of HFSF-Net in classifying diabetic retinopathy from low-quality fundus images.
由于存在噪声、边界模糊以及高频细节丢失等问题,低质量眼底图像给糖尿病视网膜病变(DR)分类带来了重大挑战,这些问题阻碍了全局上下文理解和局部细粒度特征提取。为了解决这些局限性,本文提出了一种分层频率 - 空间特征融合网络(HFSF - Net),该网络有效地整合频域和空间域信息以实现稳健的DR分类。首先,本文介绍了空间先验感知变换器(SPAT)模块,它结合空间先验知识来指导注意力分布,从而能够在低质量眼底图像中精确定位病变区域的复杂分布。随后,开发了一种新颖的小波增强自注意力(WESA)模块,该模块利用小波变换来提取和增强微血管纹理和边缘等高频率成分。基于WESA构建了小波增强变换器(WET)模块,以增强恢复退化图像中局部细节的能力。此外,设计了一种分层频率 - 空间融合(HFSF)模块,用于分层整合多尺度特征,减轻信息冗余并解决不同域之间的特征冲突。通过这种架构,模型实现了全局和局部信息的平衡表示。在APTOS和DDR数据集上进行的实验分别得到的ACC值为0.8117和0.8021,Kappa分数为0.7158和0.6731。虽然该模型的召回率没有达到特别高的水平,但其在其他关键指标上始终表现强劲,这支持了所提出的架构能够实现全局和局部信息平衡表示的说法。此外,实验结果验证了HFSF - Net在从低质量眼底图像中分类糖尿病视网膜病变方面的有效性和鲁棒性。