Arnob Arjun Kumar Bose, Chayon Muhammad Hasibur Rashid, Al Farid Fahmid, Husen Mohd Nizam, Ahmed Firoz
Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh.
Faculty of Computer Science and Informatics, Berlin School of Business and Innovation, 12043 Berlin, Germany.
J Imaging. 2025 Aug 15;11(8):275. doi: 10.3390/jimaging11080275.
Timely, balanced, and transparent detection of retinal diseases is essential to avert irreversible vision loss; however, current deep learning screeners are hampered by class imbalance, large models, and opaque reasoning. This paper presents a lightweight attention-augmented convolutional neural network (CNN) that addresses all three barriers. The network combines depthwise separable convolutions, squeeze-and-excitation, and global-context attention, and it incorporates gradient-based class activation mapping (Grad-CAM) and Grad-CAM++ to ensure that every decision is accompanied by pixel-level evidence. A 5335-image ten-class color-fundus dataset from Bangladeshi clinics, which was severely skewed (17-1509 images per class), was equalized using a synthetic minority oversampling technique (SMOTE) and task-specific augmentations. Images were resized to 150×150 px and split 70:15:15. The training used the adaptive moment estimation (Adam) optimizer (initial learning rate of 1×10-4, reduce-on-plateau, early stopping), ℓ2 regularization, and dual dropout. The 16.6 M parameter network converged in fewer than 50 epochs on a mid-range graphics processing unit (GPU) and reached 87.9% test accuracy, a macro-precision of 0.882, a macro-recall of 0.879, and a macro-F1-score of 0.880, reducing the error by 58% relative to the best ImageNet backbone (Inception-V3, 40.4% accuracy). Eight disorders recorded true-positive rates above 95%; macular scar and central serous chorioretinopathy attained F1-scores of 0.77 and 0.89, respectively. Saliency maps consistently highlighted optic disc margins, subretinal fluid, and other hallmarks. Targeted class re-balancing, lightweight attention, and integrated explainability, therefore, deliver accurate, transparent, and deployable retinal screening suitable for point-of-care ophthalmic triage on resource-limited hardware.
及时、平衡且透明地检测视网膜疾病对于避免不可逆的视力丧失至关重要;然而,当前的深度学习筛查工具受到类别不平衡、模型庞大和推理不透明的阻碍。本文提出了一种轻量级注意力增强卷积神经网络(CNN),该网络解决了所有这三个障碍。该网络结合了深度可分离卷积、挤压激励和全局上下文注意力,并融入了基于梯度的类激活映射(Grad-CAM)和Grad-CAM++,以确保每个决策都伴有像素级证据。使用合成少数类过采样技术(SMOTE)和特定任务增强对来自孟加拉国诊所的一个包含5335张图像的十类彩色眼底数据集进行了均衡处理,该数据集严重不均衡(每类17 - 1509张图像)。图像被调整为150×150像素,并按70:15:15进行分割。训练使用了自适应矩估计(Adam)优化器(初始学习率为1×10 - 4,基于性能下降调整学习率,提前停止)、ℓ2正则化和双重随机失活。这个具有1660万个参数的网络在一个中等性能的图形处理单元(GPU)上不到50个轮次就收敛了,测试准确率达到87.9%,宏精度为0.882,宏召回率为0.879,宏F1分数为0.880,相对于最佳的ImageNet主干网络(Inception-V3,准确率40.4%),误差降低了58%。八种疾病的真阳性率高于95%;黄斑瘢痕和中心性浆液性脉络膜视网膜病变的F1分数分别达到0.77和0.89。显著性图始终突出显示视盘边缘、视网膜下液和其他特征。因此,有针对性的类别重新平衡、轻量级注意力和集成的可解释性能够提供准确、透明且可部署的视网膜筛查,适用于在资源有限的硬件上进行即时护理眼科分诊。