Suppr超能文献

从图像到序列:探索用于光学相干断层扫描分类的视觉Transformer

From Image to Sequence: Exploring Vision Transformers for Optical Coherence Tomography Classification.

作者信息

Arbab Amirali, Habibi Aref, Rabbani Hossein, Tajmirriahi Mahnoosh

机构信息

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran.

Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.

出版信息

J Med Signals Sens. 2025 Jun 9;15:18. doi: 10.4103/jmss.jmss_58_24. eCollection 2025.

Abstract

BACKGROUND

Optical coherence tomography (OCT) is a pivotal imaging technique for the early detection and management of critical retinal diseases, notably diabetic macular edema and age-related macular degeneration. These conditions are significant global health concerns, affecting millions and leading to vision loss if not diagnosed promptly. Current methods for OCT image classification encounter specific challenges, such as the inherent complexity of retinal structures and considerable variability across different OCT datasets.

METHODS

This paper introduces a novel hybrid model that integrates the strengths of convolutional neural networks (CNNs) and vision transformer (ViT) to overcome these obstacles. The synergy between CNNs, which excel at extracting detailed localized features, and ViT, adept at recognizing long-range patterns, enables a more effective and comprehensive analysis of OCT images.

RESULTS

While our model achieves an accuracy of 99.80% on the OCT2017 dataset, its standout feature is its parameter efficiency-requiring only 6.9 million parameters, significantly fewer than larger, more complex models such as Xception and OpticNet-71.

CONCLUSION

This efficiency underscores the model's suitability for clinical settings, where computational resources may be limited but high accuracy and rapid diagnosis are imperative. The code for this study is available at https://github.com/Amir1831/ViT4OCT.

摘要

背景

光学相干断层扫描(OCT)是一种关键的成像技术,用于早期检测和管理严重的视网膜疾病,尤其是糖尿病性黄斑水肿和年龄相关性黄斑变性。这些疾病是全球重大的健康问题,影响着数百万人,如果不及时诊断会导致视力丧失。当前用于OCT图像分类的方法面临特定挑战,例如视网膜结构的固有复杂性以及不同OCT数据集之间的显著变异性。

方法

本文介绍了一种新颖的混合模型,该模型整合了卷积神经网络(CNN)和视觉Transformer(ViT)的优势,以克服这些障碍。擅长提取详细局部特征的CNN与擅长识别远距离模式的ViT之间的协同作用,能够对OCT图像进行更有效、更全面的分析。

结果

虽然我们的模型在OCT2017数据集上达到了99.80%的准确率,但其突出特点是参数效率高——仅需690万个参数,明显少于诸如Xception和OpticNet - 71等更大、更复杂的模型。

结论

这种效率凸显了该模型在临床环境中的适用性,在临床环境中计算资源可能有限,但高精度和快速诊断至关重要。本研究的代码可在https://github.com/Amir1831/ViT4OCT获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验