Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA.
Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.
BMC Med. 2022 Sep 15;20(1):307. doi: 10.1186/s12916-022-02499-7.
Gestational diabetes (GDM) is prevalent and benefits from timely and effective treatment, given the short window to impact glycemic control. Clinicians face major barriers to choosing effectively among treatment modalities [medical nutrition therapy (MNT) with or without pharmacologic treatment (antidiabetic oral agents and/or insulin)]. We investigated whether clinical data at varied stages of pregnancy can predict GDM treatment modality.
Among a population-based cohort of 30,474 pregnancies with GDM delivered at Kaiser Permanente Northern California in 2007-2017, we selected those in 2007-2016 as the discovery set and 2017 as the temporal/future validation set. Potential predictors were extracted from electronic health records at different timepoints (levels 1-4): (1) 1-year preconception to the last menstrual period, (2) the last menstrual period to GDM diagnosis, (3) at GDM diagnosis, and (4) 1 week after GDM diagnosis. We compared transparent and ensemble machine learning prediction methods, including least absolute shrinkage and selection operator (LASSO) regression and super learner, containing classification and regression tree, LASSO regression, random forest, and extreme gradient boosting algorithms, to predict risks for pharmacologic treatment beyond MNT.
The super learner using levels 1-4 predictors had higher predictability [tenfold cross-validated C-statistic in discovery/validation set: 0.934 (95% CI: 0.931-0.936)/0.815 (0.800-0.829)], compared to levels 1, 1-2, and 1-3 (discovery/validation set C-statistic: 0.683-0.869/0.634-0.754). A simpler, more interpretable model, including timing of GDM diagnosis, diagnostic fasting glucose value, and the status and frequency of glycemic control at fasting during one-week post diagnosis, was developed using tenfold cross-validated logistic regression based on super learner-selected predictors. This model compared to the super learner had only a modest reduction in predictability [discovery/validation set C-statistic: 0.825 (0.820-0.830)/0.798 (95% CI: 0.783-0.813)].
Clinical data demonstrated reasonably high predictability for GDM treatment modality at the time of GDM diagnosis and high predictability at 1-week post GDM diagnosis. These population-based, clinically oriented models may support algorithm-based risk-stratification for treatment modality, inform timely treatment, and catalyze more effective management of GDM.
妊娠糖尿病(GDM)普遍存在,且受益于及时有效的治疗,因为控制血糖的窗口期很短。临床医生在选择治疗方式方面面临着重大障碍[医学营养疗法(MNT)加或不加药物治疗(抗糖尿病口服药物和/或胰岛素)]。我们研究了在妊娠的不同阶段的临床数据是否可以预测 GDM 的治疗方式。
在 2007 年至 2017 年期间,在凯撒永久北加利福尼亚进行的一项基于人群的 30474 例 GDM 分娩队列中,我们选择了 2007 年至 2016 年的数据作为发现集,2017 年的数据作为时间/未来验证集。从电子健康记录中提取不同时间点(1-4 级)的潜在预测因子:(1)受孕前 1 年至末次月经周期,(2)末次月经周期至 GDM 诊断,(3)GDM 诊断时,以及(4)GDM 诊断后 1 周。我们比较了透明和集成机器学习预测方法,包括最小绝对收缩和选择算子(LASSO)回归和超级学习者,包含分类和回归树、LASSO 回归、随机森林和极端梯度提升算法,以预测 MNT 以外的药物治疗风险。
使用 1-4 级预测因子的超级学习者具有更高的预测能力[发现/验证集中的十倍交叉验证 C 统计量:0.934(95%CI:0.931-0.936)/0.815(0.800-0.829)],与 1 级、1-2 级和 1-3 级(发现/验证集 C 统计量:0.683-0.869/0.634-0.754)相比。使用基于超级学习者选择的预测因子的十倍交叉验证逻辑回归开发了一种更简单、更具可解释性的模型,该模型包括 GDM 诊断时间、诊断空腹血糖值以及 GDM 诊断后一周内空腹血糖控制的状态和频率。与超级学习者相比,该模型的预测能力仅略有下降[发现/验证集 C 统计量:0.825(0.820-0.830)/0.798(95%CI:0.783-0.813)]。
在 GDM 诊断时和 GDM 诊断后 1 周时,临床数据显示 GDM 治疗方式具有相当高的预测能力。这些基于人群的、以临床为导向的模型可能支持基于算法的治疗方式风险分层,及时治疗,并促进 GDM 的更有效管理。