从图像到序列：探索用于光学相干断层扫描分类的视觉Transformer

From Image to Sequence: Exploring Vision Transformers for Optical Coherence Tomography Classification.

作者信息

Arbab Amirali, Habibi Aref, Rabbani Hossein, Tajmirriahi Mahnoosh

机构信息

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran.

Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.

出版信息

J Med Signals Sens. 2025 Jun 9;15:18. doi: 10.4103/jmss.jmss_58_24. eCollection 2025.

DOI:10.4103/jmss.jmss_58_24

PMID:40546332

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12180780/

Abstract

BACKGROUND

Optical coherence tomography (OCT) is a pivotal imaging technique for the early detection and management of critical retinal diseases, notably diabetic macular edema and age-related macular degeneration. These conditions are significant global health concerns, affecting millions and leading to vision loss if not diagnosed promptly. Current methods for OCT image classification encounter specific challenges, such as the inherent complexity of retinal structures and considerable variability across different OCT datasets.

METHODS

This paper introduces a novel hybrid model that integrates the strengths of convolutional neural networks (CNNs) and vision transformer (ViT) to overcome these obstacles. The synergy between CNNs, which excel at extracting detailed localized features, and ViT, adept at recognizing long-range patterns, enables a more effective and comprehensive analysis of OCT images.

RESULTS

While our model achieves an accuracy of 99.80% on the OCT2017 dataset, its standout feature is its parameter efficiency-requiring only 6.9 million parameters, significantly fewer than larger, more complex models such as Xception and OpticNet-71.

CONCLUSION

This efficiency underscores the model's suitability for clinical settings, where computational resources may be limited but high accuracy and rapid diagnosis are imperative. The code for this study is available at https://github.com/Amir1831/ViT4OCT.

摘要

背景

光学相干断层扫描（OCT）是一种关键的成像技术，用于早期检测和管理严重的视网膜疾病，尤其是糖尿病性黄斑水肿和年龄相关性黄斑变性。这些疾病是全球重大的健康问题，影响着数百万人，如果不及时诊断会导致视力丧失。当前用于OCT图像分类的方法面临特定挑战，例如视网膜结构的固有复杂性以及不同OCT数据集之间的显著变异性。

方法

本文介绍了一种新颖的混合模型，该模型整合了卷积神经网络（CNN）和视觉Transformer（ViT）的优势，以克服这些障碍。擅长提取详细局部特征的CNN与擅长识别远距离模式的ViT之间的协同作用，能够对OCT图像进行更有效、更全面的分析。

结果

虽然我们的模型在OCT2017数据集上达到了99.80%的准确率，但其突出特点是参数效率高——仅需690万个参数，明显少于诸如Xception和OpticNet - 71等更大、更复杂的模型。

结论

这种效率凸显了该模型在临床环境中的适用性，在临床环境中计算资源可能有限，但高精度和快速诊断至关重要。本研究的代码可在https://github.com/Amir1831/ViT4OCT获取。

相似文献

From Image to Sequence: Exploring Vision Transformers for Optical Coherence Tomography Classification.从图像到序列：探索用于光学相干断层扫描分类的视觉Transformer

J Med Signals Sens. 2025 Jun 9;15:18. doi: 10.4103/jmss.jmss_58_24. eCollection 2025.

Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.推进呼吸系统疾病诊断：一种基于深度学习和视觉Transformer的方法及新型X射线数据集

Comput Biol Med. 2025 Aug;194:110501. doi: 10.1016/j.compbiomed.2025.110501. Epub 2025 Jun 9.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Reading aids for adults with low vision.针对视力低下成年人的阅读辅助工具。

Cochrane Database Syst Rev. 2018 Apr 17;4(4):CD003303. doi: 10.1002/14651858.CD003303.pub4.

Anti-vascular endothelial growth factor for diabetic macular oedema: a network meta-analysis.抗血管内皮生长因子治疗糖尿病性黄斑水肿：一项网状Meta分析。

Cochrane Database Syst Rev. 2017 Jun 22;6(6):CD007419. doi: 10.1002/14651858.CD007419.pub5.

Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗：一项系统综述

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Skin-CAD: Explainable deep learning classification of skin cancer from dermoscopic images by feature selection of dual high-level CNNs features and transfer learning.皮肤 CAD：基于双高级 CNN 特征选择和迁移学习的皮肤镜图像皮肤癌可解释深度学习分类。

Comput Biol Med. 2024 Aug;178:108798. doi: 10.1016/j.compbiomed.2024.108798. Epub 2024 Jun 25.

Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.使用混合变压器模型高效准确地识别美国手语手势。

Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.

本文引用的文献

Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis.用于放射图像分析的混合视觉Transformer架构的系统综述

J Imaging Inform Med. 2025 Jan 27. doi: 10.1007/s10278-024-01322-4.

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers.TransUNet：通过Transformer 的视角重新思考医学图像分割中的 U-Net 架构设计。

Med Image Anal. 2024 Oct;97:103280. doi: 10.1016/j.media.2024.103280. Epub 2024 Jul 22.

OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods.OCTDL：基于图像的深度学习方法的光学相干层析成像数据集。

Sci Data. 2024 Apr 11;11(1):365. doi: 10.1038/s41597-024-03182-7.

Is Attention all You Need in Medical Image Analysis? A Review.注意力就是你在医学图像分析中所需要的全部吗？一个综述。

IEEE J Biomed Health Inform. 2024 Mar;28(3):1398-1411. doi: 10.1109/JBHI.2023.3348436. Epub 2024 Mar 6.

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives.基于 Transformer 的医学影像变革？关键特性、当前进展和未来展望的对比综述。

Med Image Anal. 2023 Apr;85:102762. doi: 10.1016/j.media.2023.102762. Epub 2023 Jan 31.

Multi-scale convolutional neural network for automated AMD classification using retinal OCT images.用于使用视网膜光学相干断层扫描（OCT）图像进行年龄相关性黄斑变性（AMD）自动分类的多尺度卷积神经网络。

Comput Biol Med. 2022 May;144:105368. doi: 10.1016/j.compbiomed.2022.105368. Epub 2022 Mar 2.

Enhanced medical diagnosis for dOCTors: a perspective of optical coherence tomography.增强医生的医学诊断能力：光学相干断层扫描的视角。

J Biomed Opt. 2021 Oct;26(10). doi: 10.1117/1.JBO.26.10.100601.

A novel multiscale and multipath convolutional neural network based age-related macular degeneration detection using OCT images.基于 OCT 图像的新型多尺度多路径卷积神经网络年龄相关性黄斑变性检测。

Comput Methods Programs Biomed. 2021 Sep;209:106294. doi: 10.1016/j.cmpb.2021.106294. Epub 2021 Jul 27.

RAG-FW: A Hybrid Convolutional Framework for the Automated Extraction of Retinal Lesions and Lesion-Influenced Grading of Human Retinal Pathology.RAG-FW：一种用于自动提取视网膜病变和病变影响的人类视网膜病理分级的混合卷积框架。

IEEE J Biomed Health Inform. 2021 Jan;25(1):108-120. doi: 10.1109/JBHI.2020.2982914. Epub 2021 Jan 5.

Convolutional Mixture of Experts Model: A Comparative Study on Automatic Macular Diagnosis in Retinal Optical Coherence Tomography Imaging.卷积专家混合模型：视网膜光学相干断层扫描成像中自动黄斑诊断的比较研究

J Med Signals Sens. 2019 Jan-Mar;9(1):1-14. doi: 10.4103/jmss.JMSS_27_17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从图像到序列：探索用于光学相干断层扫描分类的视觉Transformer

From Image to Sequence: Exploring Vision Transformers for Optical Coherence Tomography Classification.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献