Arbab Amirali, Habibi Aref, Rabbani Hossein, Tajmirriahi Mahnoosh
Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran.
Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
J Med Signals Sens. 2025 Jun 9;15:18. doi: 10.4103/jmss.jmss_58_24. eCollection 2025.
Optical coherence tomography (OCT) is a pivotal imaging technique for the early detection and management of critical retinal diseases, notably diabetic macular edema and age-related macular degeneration. These conditions are significant global health concerns, affecting millions and leading to vision loss if not diagnosed promptly. Current methods for OCT image classification encounter specific challenges, such as the inherent complexity of retinal structures and considerable variability across different OCT datasets.
This paper introduces a novel hybrid model that integrates the strengths of convolutional neural networks (CNNs) and vision transformer (ViT) to overcome these obstacles. The synergy between CNNs, which excel at extracting detailed localized features, and ViT, adept at recognizing long-range patterns, enables a more effective and comprehensive analysis of OCT images.
While our model achieves an accuracy of 99.80% on the OCT2017 dataset, its standout feature is its parameter efficiency-requiring only 6.9 million parameters, significantly fewer than larger, more complex models such as Xception and OpticNet-71.
This efficiency underscores the model's suitability for clinical settings, where computational resources may be limited but high accuracy and rapid diagnosis are imperative. The code for this study is available at https://github.com/Amir1831/ViT4OCT.
光学相干断层扫描(OCT)是一种关键的成像技术,用于早期检测和管理严重的视网膜疾病,尤其是糖尿病性黄斑水肿和年龄相关性黄斑变性。这些疾病是全球重大的健康问题,影响着数百万人,如果不及时诊断会导致视力丧失。当前用于OCT图像分类的方法面临特定挑战,例如视网膜结构的固有复杂性以及不同OCT数据集之间的显著变异性。
本文介绍了一种新颖的混合模型,该模型整合了卷积神经网络(CNN)和视觉Transformer(ViT)的优势,以克服这些障碍。擅长提取详细局部特征的CNN与擅长识别远距离模式的ViT之间的协同作用,能够对OCT图像进行更有效、更全面的分析。
虽然我们的模型在OCT2017数据集上达到了99.80%的准确率,但其突出特点是参数效率高——仅需690万个参数,明显少于诸如Xception和OpticNet - 71等更大、更复杂的模型。
这种效率凸显了该模型在临床环境中的适用性,在临床环境中计算资源可能有限,但高精度和快速诊断至关重要。本研究的代码可在https://github.com/Amir1831/ViT4OCT获取。