Ahmed Faisal, Uddin M D Joshem
Department of Data Science and Mathematics, Embry-Riddle Aeronautical University, 3700 Willow Creek Rd, Prescott, 86301, AZ, USA.
Department of Mathematical Sciences, The University of Texas at Dallas, 800 W Campbell Rd, Richardson, 75080, TX, USA.
J Imaging Inform Med. 2025 Sep 19. doi: 10.1007/s10278-025-01676-3.
Early detection and accurate classification of retinal diseases, such as diabetic retinopathy (DR) and age-related macular degeneration (AMD), are essential to preventing vision loss and improving patient outcomes. Traditional methods for analyzing retinal fundus images are often manual, prolonged, and rely on the expertise of the clinician, leading to delays in diagnosis and treatment. Recent advances in machine learning, particularly deep learning, have introduced automated systems to assist in retinal disease detection; however, challenges such as computational inefficiency and robustness still remain. This paper proposes a novel approach that utilizes vision transformers (ViT) through transfer learning to address challenges in ophthalmic diagnostics. Using a pre-trained ViT-Base-Patch16-224 model, we fine-tune it for diabetic retinopathy (DR) and age-related macular degeneration (AMD) classification tasks. To adapt the model for retinal fundus images, we implement a streamlined preprocessing pipeline that converts the images into PyTorch tensors and standardizes them, ensuring compatibility with the ViT architecture and improving model performance. We validated our model, OcuViT, on two datasets. We used the APTOS dataset to perform binary and five-level severity classification and the IChallenge-AMD dataset for grading age-related macular degeneration (AMD). In the five-class DR and AMD grading tasks, OcuViT outperforms all existing CNN- and ViT-based methods across multiple metrics, achieving superior accuracy and robustness. For the binary DR task, it delivers highly competitive performance. These results demonstrate that OcuViT effectively leverages ViT-based transfer learning with an efficient preprocessing pipeline, significantly improving the precision and reliability of automated ophthalmic diagnosis.
早期检测和准确分类视网膜疾病,如糖尿病性视网膜病变(DR)和年龄相关性黄斑变性(AMD),对于预防视力丧失和改善患者预后至关重要。传统的分析视网膜眼底图像的方法通常是人工的、耗时的,并且依赖临床医生的专业知识,这导致诊断和治疗的延迟。机器学习,特别是深度学习的最新进展,引入了自动化系统来辅助视网膜疾病检测;然而,诸如计算效率低下和鲁棒性等挑战仍然存在。本文提出了一种新颖的方法,即通过迁移学习利用视觉Transformer(ViT)来解决眼科诊断中的挑战。我们使用预训练的ViT-Base-Patch16-224模型,针对糖尿病性视网膜病变(DR)和年龄相关性黄斑变性(AMD)分类任务对其进行微调。为了使模型适用于视网膜眼底图像,我们实施了一个简化的预处理管道,将图像转换为PyTorch张量并进行标准化,确保与ViT架构兼容并提高模型性能。我们在两个数据集上验证了我们的模型OcuViT。我们使用APTOS数据集进行二元和五级严重程度分类,并使用IChallenge-AMD数据集对年龄相关性黄斑变性(AMD)进行分级。在五类DR和AMD分级任务中,OcuViT在多个指标上优于所有现有的基于CNN和ViT的方法,实现了更高的准确性和鲁棒性。对于二元DR任务,它提供了极具竞争力的性能。这些结果表明,OcuViT通过高效的预处理管道有效地利用了基于ViT 的迁移学习,显著提高了自动化眼科诊断的精度和可靠性。