Malone Center for Engineering in Healthcare, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Ophthalmol Glaucoma. 2024 May-Jun;7(3):222-231. doi: 10.1016/j.ogla.2024.01.005. Epub 2024 Jan 29.
Develop and evaluate the performance of a deep learning model (DLM) that forecasts eyes with low future visual field (VF) variability, and study the impact of using this DLM on sample size requirements for neuroprotective trials.
Retrospective cohort and simulation study.
We included 1 eye per patient with baseline reliable VFs, OCT, clinical measures (demographics, intraocular pressure, and visual acuity), and 5 subsequent reliable VFs to forecast VF variability using DLMs and perform sample size estimates. We estimated sample size for 3 groups of eyes: all eyes (AE), low variability eyes (LVE: the subset of AE with a standard deviation of mean deviation [MD] slope residuals in the bottom 25th percentile), and DLM-predicted low variability eyes (DLPE: the subset of AE predicted to be low variability by the DLM). Deep learning models using only baseline VF/OCT/clinical data as input (DLM1), or also using a second VF (DLM2) were constructed to predict low VF variability (DLPE1 and DLPE2, respectively). Data were split 60/10/30 into train/val/test. Clinical trial simulations were performed only on the test set. We estimated the sample size necessary to detect treatment effects of 20% to 50% in MD slope with 80% power. Power was defined as the percentage of simulated clinical trials where the MD slope was significantly worse from the control. Clinical trials were simulated with visits every 3 months with a total of 10 visits.
A total of 2817 eyes were included in the analysis. Deep learning models 1 and 2 achieved an area under the receiver operating characteristic curve of 0.73 (95% confidence interval [CI]: 0.68, 0.76) and 0.82 (95% CI: 0.78, 0.85) in forecasting low VF variability. When compared with including AE, using DLPE1 and DLPE2 reduced sample size to achieve 80% power by 30% and 38% for 30% treatment effect, and 31% and 38% for 50% treatment effect.
Deep learning models can forecast eyes with low VF variability using data from a single baseline clinical visit. This can reduce sample size requirements, and potentially reduce the burden of future glaucoma clinical trials.
FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
开发和评估一种深度学习模型(DLM)的性能,该模型可预测未来视场(VF)变异性较低的眼睛,并研究使用该 DLM 对神经保护试验样本量要求的影响。
回顾性队列和模拟研究。
我们纳入了每例患者的 1 只眼,这些眼基线时具有可靠的 VF、OCT、临床测量值(人口统计学、眼内压和视力),以及随后的 5 次可靠的 VF,以使用 DLM 预测 VF 变异性,并进行样本量估计。我们估计了 3 组眼的样本量:所有眼(AE)、低变异性眼(LVE:AE 中 MD 斜率残差标准差处于底部 25%的子集)和 DLM 预测的低变异性眼(DLPE:由 DLM 预测为低变异性的 AE 子集)。仅使用基线 VF/OCT/临床数据作为输入的深度学习模型(DLM1),或也使用第 2 个 VF 的深度学习模型(DLM2)构建,以预测低 VF 变异性(DLPE1 和 DLPE2)。数据以 60/10/30 的比例分为训练/验证/测试。临床试验模拟仅在测试集中进行。我们估计了在 80%的功效下,以 20%至 50%的幅度检测 MD 斜率治疗效果所需的样本量。功效定义为模拟临床试验中 MD 斜率明显比对照组差的百分比。临床试验每 3 个月进行一次访视,共进行 10 次访视。
共纳入 2817 只眼进行分析。深度学习模型 1 和 2 在预测低 VF 变异性方面的受试者工作特征曲线下面积分别为 0.73(95%置信区间:0.68,0.76)和 0.82(95%置信区间:0.78,0.85)。与包括 AE 相比,使用 DLPE1 和 DLPE2 可将达到 80%功效的样本量减少 30%和 38%,用于 30%的治疗效果,以及 31%和 38%用于 50%的治疗效果。
深度学习模型可以使用单次基线临床就诊的数据预测 VF 变异性较低的眼睛。这可以减少样本量要求,并可能减轻未来青光眼临床试验的负担。
本文末尾的脚注和披露中可能包含专有或商业披露信息。