Suppr超能文献

基于迁移学习的视觉变换器对眼底镜图像中的视网膜眼病进行多类别分类

Multi-class Classification of Retinal Eye Diseases from Ophthalmoscopy Images Using Transfer Learning-Based Vision Transformers.

作者信息

Cutur Elif Setenay, Inan Neslihan Gokmen

机构信息

Graduate School of Sciences and Engineering, Data Science, Koç University, Istanbul, Turkey.

College of Engineering, Department of Computer Engineering, Koç University, Rumelifeneri Yolu, 34450, Sarıyer, Istanbul, Turkey.

出版信息

J Imaging Inform Med. 2025 Jan 27. doi: 10.1007/s10278-025-01416-7.

Abstract

This study explores a transfer learning approach with vision transformers (ViTs) and convolutional neural networks (CNNs) for classifying retinal diseases, specifically diabetic retinopathy, glaucoma, and cataracts, from ophthalmoscopy images. Using a balanced subset of 4217 images and ophthalmology-specific pretrained ViT backbones, this method demonstrates significant improvements in classification accuracy, offering potential for broader applications in medical imaging. Glaucoma, diabetic retinopathy, and cataracts are common eye diseases that can cause vision loss if not treated. These diseases must be identified in the early stages to prevent eye damage progression. This paper focuses on the accurate identification and analysis of disparate eye diseases, including glaucoma, diabetic retinopathy, and cataracts, using ophthalmoscopy images. Deep learning (DL) has been widely used in image recognition for the early detection and treatment of eye diseases. In this study, ResNet50, DenseNet121, Inception-ResNetV2, and six variations of ViT are employed, and their performance in diagnosing diseases such as glaucoma, cataracts, and diabetic retinopathy is evaluated. In particular, the article uses the vision transformer model as an automated method to diagnose retinal eye diseases, highlighting the accuracy of pre-trained deep transfer learning (DTL) structures. The updated ViT#5 model with the augmented-regularized pre-trained model (AugReg ViT-L/16_224) and learning rate of 0.00002 outperforms the state-of-the-art techniques, obtaining a data-based accuracy score of 98.1% on a publicly accessible retinal ophthalmoscopy image dataset, which includes 4217 images. In most categories, the model outperforms other convolutional-based and ViT models in terms of accuracy, precision, recall, and F1 score. This research contributes significantly to medical image analysis, demonstrating the potential of AI in enhancing the precision of eye disease diagnoses and advocating for the integration of artificial intelligence in medical diagnostics.

摘要

本研究探索了一种使用视觉Transformer(ViT)和卷积神经网络(CNN)的迁移学习方法,用于从检眼镜图像中对视网膜疾病进行分类,特别是糖尿病性视网膜病变、青光眼和白内障。使用4217幅图像的平衡子集和眼科专用的预训练ViT主干,该方法在分类准确率上有显著提高,为医学成像的更广泛应用提供了潜力。青光眼、糖尿病性视网膜病变和白内障是常见的眼部疾病,如果不治疗可能导致视力丧失。这些疾病必须在早期阶段被识别出来,以防止眼部损伤的进展。本文重点使用检眼镜图像对包括青光眼、糖尿病性视网膜病变和白内障在内的不同眼部疾病进行准确识别和分析。深度学习(DL)已广泛应用于眼部疾病的早期检测和治疗的图像识别中。在本研究中,使用了ResNet50、DenseNet121、Inception-ResNetV2以及六种ViT变体,并评估了它们在诊断青光眼、白内障和糖尿病性视网膜病变等疾病方面的性能。特别是,本文使用视觉Transformer模型作为诊断视网膜眼部疾病的自动化方法,突出了预训练深度迁移学习(DTL)结构的准确性。具有增强正则化预训练模型(AugReg ViT-L/16_224)和0.00002学习率的更新后的ViT#5模型优于现有技术,在一个可公开访问的视网膜检眼镜图像数据集(包含4217幅图像)上获得了基于数据的98.1%的准确率得分。在大多数类别中,该模型在准确率、精确率、召回率和F1分数方面优于其他基于卷积的模型和ViT模型。这项研究对医学图像分析有重大贡献,展示了人工智能在提高眼部疾病诊断精度方面的潜力,并倡导将人工智能整合到医学诊断中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验