Institute of Ophthalmology, University College London, London, United Kingdom.
Moorfields Eye Hospital National Health Service Foundation Trust, London, United Kingdom.
JAMA Ophthalmol. 2024 Jun 1;142(6):573-576. doi: 10.1001/jamaophthalmol.2024.1165.
Vision-language models (VLMs) are a novel artificial intelligence technology capable of processing image and text inputs. While demonstrating strong generalist capabilities, their performance in ophthalmology has not been extensively studied.
To assess the performance of the Gemini Pro VLM in expert-level tasks for macular diseases from optical coherence tomography (OCT) scans.
DESIGN, SETTING, AND PARTICIPANTS: This was a cross-sectional diagnostic accuracy study evaluating a generalist VLM on ophthalmology-specific tasks using the open-source Optical Coherence Tomography Image Database. The dataset included OCT B-scans from 50 unique patients: healthy individuals and those with macular hole, diabetic macular edema, central serous chorioretinopathy, and age-related macular degeneration. Each OCT scan was labeled for 10 key pathological features, referral recommendations, and treatments. The images were captured using a Cirrus high definition OCT machine (Carl Zeiss Meditec) at Sankara Nethralaya Eye Hospital, Chennai, India, and the dataset was published in December 2018. Image acquisition dates were not specified.
Gemini Pro, using a standard prompt to extract structured responses on December 15, 2023.
The primary outcome was model responses compared against expert labels, calculating F1 scores for each pathological feature. Secondary outcomes included accuracy in diagnosis, referral urgency, and treatment recommendation. The model's internal concordance was evaluated by measuring the alignment between referral and treatment recommendations, independent of diagnostic accuracy.
The mean F1 score was 10.7% (95% CI, 2.4-19.2). Measurable F1 scores were obtained for macular hole (36.4%; 95% CI, 0-71.4), pigment epithelial detachment (26.1%; 95% CI, 0-46.2), subretinal hyperreflective material (24.0%; 95% CI, 0-45.2), and subretinal fluid (20.0%; 95% CI, 0-45.5). A correct diagnosis was achieved in 17 of 50 cases (34%; 95% CI, 22-48). Referral recommendations varied: 28 of 50 were correct (56%; 95% CI, 42-70), 10 of 50 were overcautious (20%; 95% CI, 10-32), and 12 of 50 were undercautious (24%; 95% CI, 12-36). Referral and treatment concordance were very high, with 48 of 50 (96%; 95 % CI, 90-100) and 48 of 49 (98%; 95% CI, 94-100) correct answers, respectively.
In this study, a generalist VLM demonstrated limited vision capabilities for feature detection and management of macular disease. However, it showed low self-contradiction, suggesting strong language capabilities. As VLMs continue to improve, validating their performance on large benchmarking datasets will help ascertain their potential in ophthalmology.
视觉语言模型(VLMs)是一种新型的人工智能技术,能够处理图像和文本输入。虽然表现出很强的通才能力,但它们在眼科领域的性能尚未得到广泛研究。
评估 Gemini Pro VLM 在眼科专用任务中对来自光学相干断层扫描(OCT)扫描的黄斑疾病的表现。
设计、设置和参与者:这是一项横断面诊断准确性研究,使用开源的光学相干断层扫描图像数据库评估通用 VLM 在眼科特定任务上的性能。该数据集包括来自 50 个独特患者的 OCT B 扫描:健康个体和患有黄斑裂孔、糖尿病性黄斑水肿、中心性浆液性脉络膜视网膜病变和年龄相关性黄斑变性的患者。每个 OCT 扫描都标记了 10 个关键病理特征、转诊建议和治疗方法。图像是使用 Cirrus 高清 OCT 机(卡尔蔡司 Meditec)在印度钦奈的 Sankara Nethralaya 眼科医院拍摄的,数据集于 2018 年 12 月发布。未指定图像采集日期。
2023 年 12 月 15 日,使用标准提示语提取 Gemini Pro 的结构化回复。
主要结果是将模型响应与专家标签进行比较,计算每个病理特征的 F1 分数。次要结果包括诊断准确性、转诊紧迫性和治疗建议。通过测量转诊和治疗建议之间的一致性,评估模型的内部一致性,而不考虑诊断准确性。
平均 F1 分数为 10.7%(95%CI,2.4-19.2)。黄斑裂孔(36.4%;95%CI,0-71.4)、色素上皮脱离(26.1%;95%CI,0-46.2)、视网膜下高反射物质(24.0%;95%CI,0-45.2)和视网膜下液(20.0%;95%CI,0-45.5)可获得可衡量的 F1 分数。50 例中有 17 例(34%;95%CI,22-48)做出正确诊断。转诊建议各不相同:50 例中有 28 例(56%;95%CI,42-70)正确,10 例(20%;95%CI,10-32)过于谨慎,12 例(24%;95%CI,12-36)过于谨慎。转诊和治疗的一致性非常高,分别有 48 例(96%;95%CI,90-100)和 48 例(98%;95%CI,94-100)正确答案。
在这项研究中,通用 VLM 对黄斑疾病的特征检测和管理表现出有限的视觉能力。然而,它表现出较低的自我矛盾,表明其具有较强的语言能力。随着 VLMs 的不断改进,在大型基准数据集上验证其性能将有助于确定它们在眼科领域的潜力。