Katuru Abhilash, Chung In Young, Majid Iyad, Shen Lucy Q, Wang Mengyu
Harvard Ophthalmology AI Lab, Schepens Eye Research Institute of Massachusetts Eye and Ear, Harvard Medical School, Boston, MA.
Department of Ophthalmology, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA.
Ophthalmol Sci. 2025 Jul 5;5(6):100877. doi: 10.1016/j.xops.2025.100877. eCollection 2025 Nov-Dec.
To determine whether a deep learning (DL) model using retinal nerve fiber layer thickness (RNFLT) maps from OCT scans can detect glaucoma, defined by functional visual field (VF) impairment, more accurately than a DL model using disc photos (DPs). A secondary objective was to assess the diagnostic performance of these DL models across demographic groups (race, sex, and ethnicity).
Retrospective cohort study at a tertiary glaucoma center utilizing OCT and DP datasets collected between 2011 and 2022.
Out of the 16 936 DP and OCT image sets, patients with Cirrus OCT images with a quality score ≥6 of 10 and reliable 24-2 Humphrey VF tests (fixation loss ≤33%, false-negative rate ≤20%, false-positive rate ≤20%), taken within 30 days of OCT, were included. Disc photos were obtained within 6 months of OCT. Data were randomly selected for training and testing of the DL models.
Development of DL models utilizing either OCT RNFLT maps or DPs to detect glaucoma based on VF-defined functional impairment.
The primary outcome was the area under the curve (AUC) for glaucoma detection, comparing the OCT-based DL model with the DP-based model. The secondary outcome was the AUC across demographic groups.
The OCT-based DL model achieved an AUC of 0.90, significantly outperforming the DP-based model (AUC = 0.86, < 0.005) with superior performance consistent across demographic groups. The OCT and DP model accuracies varied significantly by demographic groups. For the OCT model, AUCs were 0.93, 0.92, and 0.92 for Asians, Blacks, and Whites ( < 0.005); 0.89 for women versus 0.93 for men ( = 0.005); and 0.92 for Hispanics versus 0.94 for non-Hispanics ( < 0.005). For the DP model, corresponding AUCs for race were 0.87, 0.90, and 0.82 ( < 0.005); for sex, 0.856 versus 0.862 ( < 0.005); and for Hispanics, 0.85 versus 0.79 ( < 0.005).
When glaucoma diagnosis was based on functional deficit, the OCT-based DL model offered greater accuracy in detecting glaucoma than the DP-based model, likely due to its use of objective and quantitative RNFLT measurements. This work supports the use of OCT-based DL models for glaucoma detection, while observed demographic disparities underscore the need for equitable datasets to ensure fair DL-driven glaucoma diagnosis across populations.
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
确定使用光学相干断层扫描(OCT)扫描的视网膜神经纤维层厚度(RNFLT)图的深度学习(DL)模型,与使用视盘照片(DP)的DL模型相比,是否能更准确地检测出由功能性视野(VF)损害定义的青光眼。次要目的是评估这些DL模型在不同人口统计学群体(种族、性别和族裔)中的诊断性能。
在一家三级青光眼中心进行的回顾性队列研究,利用2011年至2022年期间收集的OCT和DP数据集。
在16936套DP和OCT图像集中,纳入了在OCT检查后30天内获取的、质量评分≥6(满分10分)的Cirrus OCT图像且24-2 Humphrey VF测试结果可靠(固视丢失≤33%,假阴性率≤20%,假阳性率≤20%)的患者。视盘照片在OCT检查后6个月内获取。数据被随机选择用于DL模型的训练和测试。
开发利用OCT RNFLT图或DP来基于VF定义的功能损害检测青光眼的DL模型。
主要结果是青光眼检测的曲线下面积(AUC),比较基于OCT的DL模型和基于DP的模型。次要结果是不同人口统计学群体的AUC。
基于OCT的DL模型的AUC为0.90,显著优于基于DP的模型(AUC = 0.86,P < 0.005),且在不同人口统计学群体中表现均更优。OCT和DP模型的准确率在不同人口统计学群体中差异显著。对于OCT模型,亚洲人、黑人和白人的AUC分别为0.93、0.92和0.92(P < 0.005);女性为0.89,男性为0.93(P = 0.005);西班牙裔为0.92,非西班牙裔为0.94(P < 0.005)。对于DP模型,种族对应的AUC分别为0.87、0.90和0.82(P < 0.005);性别方面,分别为0.856和0.862(P < 0.005);西班牙裔为0.85,非西班牙裔为0.79(P < 0.005)。
当基于功能缺陷诊断青光眼时,基于OCT的DL模型在检测青光眼方面比基于DP的模型具有更高的准确性,这可能是由于其使用了客观和定量的RNFLT测量。这项工作支持使用基于OCT的DL模型进行青光眼检测,而观察到的人口统计学差异强调了需要公平的数据集,以确保在不同人群中基于DL的青光眼诊断的公正性。
在本文末尾的脚注和披露中可能会找到专有或商业披露信息。