Ruamviboonsuk Paisan, Arjkongharn Niracha, Vongsa Nattaporn, Pakaymaskul Pawin, Kaothanthong Natsuda
Department of Ophthalmology, College of Medicine, Rangsit University, Bangkok, Thailand.
Sirindhorn International Institute of Technology, Thammasat University, Bangkok, Thailand.
Taiwan J Ophthalmol. 2024 Nov 28;14(4):473-485. doi: 10.4103/tjo.TJO-D-24-00064. eCollection 2024 Oct-Dec.
Recent advances of artificial intelligence (AI) in retinal imaging found its application in two major categories: discriminative and generative AI. For discriminative tasks, conventional convolutional neural networks (CNNs) are still major AI techniques. Vision transformers (ViT), inspired by the transformer architecture in natural language processing, has emerged as useful techniques for discriminating retinal images. ViT can attain excellent results when pretrained at sufficient scale and transferred to specific tasks with fewer images, compared to conventional CNN. Many studies found better performance of ViT, compared to CNN, for common tasks such as diabetic retinopathy screening on color fundus photographs (CFP) and segmentation of retinal fluid on optical coherence tomography (OCT) images. Generative Adversarial Network (GAN) is the main AI technique in generative AI in retinal imaging. Novel images generated by GAN can be applied for training AI models in imbalanced or inadequate datasets. Foundation models are also recent advances in retinal imaging. They are pretrained with huge datasets, such as millions of CFP and OCT images and fine-tuned for downstream tasks with much smaller datasets. A foundation model, RETFound, which was self-supervised and found to discriminate many eye and systemic diseases better than supervised models. Large language models are foundation models that may be applied for text-related tasks, like reports of retinal angiography. Whereas AI technology moves forward fast, real-world use of AI models moves slowly, making the gap between development and deployment even wider. Strong evidence showing AI models can prevent visual loss may be required to close this gap.
人工智能(AI)在视网膜成像方面的最新进展主要应用于两大类:判别式AI和生成式AI。对于判别式任务,传统的卷积神经网络(CNN)仍然是主要的AI技术。受自然语言处理中Transformer架构启发的视觉Transformer(ViT)已成为用于判别视网膜图像的有用技术。与传统的CNN相比,ViT在足够规模上进行预训练并转移到图像较少的特定任务时,可以获得出色的结果。许多研究发现,与CNN相比,ViT在诸如彩色眼底照片(CFP)上的糖尿病视网膜病变筛查和光学相干断层扫描(OCT)图像上的视网膜液分割等常见任务中表现更好。生成对抗网络(GAN)是视网膜成像中生成式AI的主要AI技术。GAN生成的新图像可用于在不平衡或不充分的数据集中训练AI模型。基础模型也是视网膜成像的最新进展。它们使用大量数据集(例如数百万张CFP和OCT图像)进行预训练,并使用小得多的数据集对下游任务进行微调。一种基础模型RETFound,它是自监督的,并且发现其在判别许多眼部和全身性疾病方面比监督模型更好。大型语言模型是可应用于与文本相关任务(如视网膜血管造影报告)的基础模型。尽管AI技术发展迅速,但AI模型在现实世界中的应用进展缓慢,使得开发与部署之间的差距甚至更大。可能需要有力证据表明AI模型可以预防视力丧失,以弥合这一差距。