使用变压器模型、特征融合和集成学习的多阶段框架，用于增强眼病分类。

Multi-stage framework using transformer models, feature fusion and ensemble learning for enhancing eye disease classification.

作者信息

AlMohimeed Abdulaziz

机构信息

College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, Saudi Arabia.

出版信息

Sci Rep. 2025 Aug 19;15(1):30469. doi: 10.1038/s41598-025-16415-5.

DOI:10.1038/s41598-025-16415-5

PMID:40830659

Abstract

Eye diseases can affect vision and well-being, so early, accurate diagnosis is crucial to prevent serious impairment. Deep learning models have shown promise for automating the diagnosis of eye diseases from images. However, current methods mostly use single-model architectures, including convolutional neural networks (CNNs), which might not adequately capture the long-range spatial correlations and local fine-grained features required for classification. To address these limitations, this study proposes a multi-stage framework for eye diseases (MST-EDS), including two stages: hybrid and stacking models in the categorization of eye illnesses across four classes: normal, diabetic_retinopathy, glaucoma, and cataract, utilizing a benchmark dataset from Kaggle. Hybrid models are developed based on Transformer models: Vision Transformer (ViT), Data-efficient Image Transformer (DeiT), and Swin Transformer are used to extract deep features from images, Principal Component Analysis (PCA) is used to reduce the complexity of extracted features, and Machine Learning (ML) models are used as classifiers to enhance performance. In the stacking model, the outputs of the best hybrid models are stacked, and they are used to train and evaluate meta-learners to improve classification performance. The experimental results show that the MST-EDS-RF model recorded the best performance compared to individual Transformer and hybrid models, with 97.163% accuracy.

摘要

眼部疾病会影响视力和健康，因此早期准确诊断对于预防严重损害至关重要。深度学习模型已显示出从图像中自动诊断眼部疾病的潜力。然而，当前方法大多使用单模型架构，包括卷积神经网络（CNN），这可能无法充分捕捉分类所需的长程空间相关性和局部细粒度特征。为解决这些局限性，本研究提出了一种用于眼部疾病的多阶段框架（MST-EDS），包括两个阶段：在利用来自Kaggle的基准数据集对正常、糖尿病视网膜病变、青光眼和白内障这四类眼部疾病进行分类时，采用混合模型和堆叠模型。混合模型基于Transformer模型开发：使用视觉Transformer（ViT）、数据高效图像Transformer（DeiT）和Swin Transformer从图像中提取深度特征，使用主成分分析（PCA）降低提取特征的复杂度，并使用机器学习（ML）模型作为分类器来提高性能。在堆叠模型中，将最佳混合模型的输出进行堆叠，并用于训练和评估元学习器以提高分类性能。实验结果表明，与单个Transformer模型和混合模型相比，MST-EDS-RF模型表现最佳，准确率达到97.163%。