Tallon Erin M, Williams David D, Schweisberger Cintya, Mullaney Colin, Lockee Brent, Ferro Diana, Vandervelden Craig A, Barnes Mitchell S, Sarteau Angelica Cristello, Kahkoska Anna R, Patton Susana R, Mehta Sanjeev, McDonough Ryan, Lind Marcus, D'Avolio Leonard, Clements Mark A
Division of Pediatric Endocrinology and Diabetes, Children's Mercy Kansas City, 2401 Gillham Road, Kansas City, MO, United States, 1 8166014023.
Department of Pediatrics, UMKC School of Medicine, Kansas City, MO, United States.
JMIR Diabetes. 2025 Sep 25;10:e69142. doi: 10.2196/69142.
Clinicians currently lack an effective means for identifying youth with type 1 diabetes (T1D) who are at risk for experiencing glycemic deterioration between diabetes clinic visits. As a result, their ability to identify youth who may optimally benefit from targeted interventions designed to address rising glycemic levels is limited. Although electronic health records (EHR)-based risk predictions have been used to forecast health outcomes in T1D, no study has investigated the potential for using EHR data to identify youth with T1D who will experience a clinically significant rise in glycated hemoglobin (HbA1c) ≥0.3% (approximately 3 mmol/mol) between diabetes clinic visits.
We aimed to evaluate the feasibility of using routinely collected EHR data to develop a machine learning model to predict 90-day unit-change in HbA1c (in % units) in youth (aged 9-18 y) with T1D. We assessed our model's ability to augment clinical decision-making by identifying a percent change cut point that optimized identification of youth who would experience a clinically significant rise in HbA1c.
From a cohort of 2757 youth with T1D who received care from a network of pediatric diabetes clinics in the Midwestern United States (January 2012-August 2017), we identified 1743 youth with 9643 HbA1c observation windows (ie, 2 HbA1c measurements separated by 70-110 d, approximating the 90-day time interval between routine diabetes clinic visits). We used up to 5 years of youths' longitudinal EHR data to transform 17,466 features (demographics, laboratory results, vital signs, anthropometric measures, medications, diagnosis codes, procedure codes, and free-text data) for model training. We performed 3-fold cross-validation to train random forest regression models to predict 90-day unit-change in HbA1c(%).
Across all 3 folds of our cross-validation model, the average root-mean-square error was 0.88 (95% CI 0.85-0.90). Predicted HbA1c(%) strongly correlated with true HbA1c(%) (r=0.79; 95% CI 0.78-0.80). The top 10 features impacting model predictions included postal code, various metrics related to HbA1c, and the frequency of a diagnosis code indicating difficulty with treatment engagement. At a clinically significant percent rise threshold of ≥0.3% (approximately 3 mmol/mol), our model's positive predictive value was 60.3%, indicating a 1.5-fold enrichment (relative to the observed frequency that youth experienced this outcome [3928/9643, 40.7%]). Model sensitivity and positive predictive value improved when thresholds for clinical significance included smaller changes in HbA1c, whereas specificity and negative predictive value improved when thresholds required larger changes in HbA1c.
Routinely collected EHR data can be used to create an ML model for predicting unit-change in HbA1c between diabetes clinic visits among youth with T1D. Future work will focus on optimizing model performance and validating the model in additional cohorts and in other diabetes clinics.
临床医生目前缺乏一种有效的方法来识别1型糖尿病(T1D)青少年患者,这些患者在糖尿病门诊就诊期间存在血糖恶化风险。因此,他们识别那些可能从旨在解决血糖水平升高的针对性干预措施中获得最大益处的青少年的能力有限。虽然基于电子健康记录(EHR)的风险预测已被用于预测T1D的健康结果,但尚无研究调查利用EHR数据识别在糖尿病门诊就诊期间糖化血红蛋白(HbA1c)将出现临床显著升高≥0.3%(约3 mmol/mol)的T1D青少年的可能性。
我们旨在评估利用常规收集的EHR数据开发机器学习模型以预测9-18岁T1D青少年患者HbA1c(以%为单位)90天单位变化的可行性。我们通过确定一个百分比变化切点来优化对HbA1c将出现临床显著升高的青少年的识别,从而评估我们模型增强临床决策的能力。
在美国中西部地区一个儿科糖尿病诊所网络接受治疗的2757例T1D青少年队列中(2012年1月至2017年8月),我们识别出1743例青少年,他们有9643个HbA1c观察窗(即两次HbA1c测量间隔70 - 110天,近似常规糖尿病门诊就诊的90天时间间隔)。我们使用青少年长达5年的纵向EHR数据来转换17466个特征(人口统计学、实验室结果、生命体征、人体测量指标、药物、诊断编码、手术编码和自由文本数据)用于模型训练。我们进行3折交叉验证以训练随机森林回归模型来预测HbA1c(%)的90天单位变化。
在我们交叉验证模型的所有3折中,平均均方根误差为0.88(95% CI 0.85 - 0.90)。预测的HbA1c(%)与实际HbA1c(%)高度相关(r = 0.79;95% CI 0.78 - 0.80)。影响模型预测的前10个特征包括邮政编码、与HbA1c相关的各种指标以及一个表明治疗参与困难的诊断编码的频率。在临床显著百分比升高阈值≥0.3%(约3 mmol/mol)时,我们模型的阳性预测值为60.3%,表明富集了1.5倍(相对于青少年出现此结果的观察频率[3928/9643, 40.7%])。当临床显著性阈值包括HbA1c较小变化时,模型敏感性和阳性预测值提高,而当阈值要求HbA1c有较大变化时,特异性和阴性预测值提高。
常规收集的EHR数据可用于创建一个机器学习模型,以预测T1D青少年患者在糖尿病门诊就诊期间HbA1c的单位变化。未来的工作将集中在优化模型性能,并在其他队列和其他糖尿病诊所验证该模型。