Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Malone Center for Engineering, Johns Hopkins University, Baltimore, MD, USA.
Sci Rep. 2023 Jan 19;13(1):1041. doi: 10.1038/s41598-023-28003-6.
Glaucoma is a leading cause of irreversible blindness, and its worsening is most often monitored with visual field (VF) testing. Deep learning models (DLM) may help identify VF worsening consistently and reproducibly. In this study, we developed and investigated the performance of a DLM on a large population of glaucoma patients. We included 5099 patients (8705 eyes) seen at one institute from June 1990 to June 2020 that had VF testing as well as clinician assessment of VF worsening. Since there is no gold standard to identify VF worsening, we used a consensus of six commonly used algorithmic methods which include global regressions as well as point-wise change in the VFs. We used the consensus decision as a reference standard to train/test the DLM and evaluate clinician performance. 80%, 10%, and 10% of patients were included in training, validation, and test sets, respectively. Of the 873 eyes in the test set, 309 [60.6%] were from females and the median age was 62.4; (IQR 54.8-68.9). The DLM achieved an AUC of 0.94 (95% CI 0.93-0.99). Even after removing the 6 most recent VFs, providing fewer data points to the model, the DLM successfully identified worsening with an AUC of 0.78 (95% CI 0.72-0.84). Clinician assessment of worsening (based on documentation from the health record at the time of the final VF in each eye) had an AUC of 0.64 (95% CI 0.63-0.66). Both the DLM and clinician performed worse when the initial disease was more severe. This data shows that a DLM trained on a consensus of methods to define worsening successfully identified VF worsening and could help guide clinicians during routine clinical care.
青光眼是导致不可逆性失明的主要原因,其病情恶化通常通过视野(VF)测试进行监测。深度学习模型(DLM)可能有助于一致且可重复地识别 VF 恶化。在这项研究中,我们开发并研究了一个 DLM 在大量青光眼患者中的性能。我们纳入了 1990 年 6 月至 2020 年 6 月期间在一家机构就诊的 5099 名患者(8705 只眼),这些患者进行了 VF 测试以及临床医生对 VF 恶化的评估。由于没有金标准来识别 VF 恶化,我们使用了六种常用算法方法的共识,包括全局回归以及 VF 中的逐点变化。我们使用共识决策作为参考标准来训练/测试 DLM 并评估临床医生的表现。患者的 80%、10%和 10%分别被纳入训练、验证和测试集。在测试集中的 873 只眼中,309 只(60.6%)来自女性,中位年龄为 62.4;(IQR 54.8-68.9)。DLM 的 AUC 为 0.94(95%CI 0.93-0.99)。即使在去除最近的 6 个 VF 后,向模型提供更少的数据点,DLM 仍能以 AUC 为 0.78(95%CI 0.72-0.84)成功识别恶化。临床医生对恶化的评估(基于每只眼最后一次 VF 时健康记录中的记录)的 AUC 为 0.64(95%CI 0.63-0.66)。当初始疾病更严重时,DLM 和临床医生的表现都更差。这些数据表明,使用共识方法来定义恶化的 DLM 成功识别了 VF 恶化,并且可以帮助指导临床医生在日常临床护理中。