San Francisco VA Health Care System.
Northern California Institute for Research and Education.
Med Care. 2022 Jun 1;60(6):470-479. doi: 10.1097/MLR.0000000000001720. Epub 2022 Mar 30.
It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods.
The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data.
This was a cohort study.
Veterans Affairs (VA) EHR data.
Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each).
The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models.
Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics.
Our results should be confirmed in non-VA EHRs.
The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.
目前尚不清楚机器学习方法与传统回归方法相比,能否为电子健康记录(EHR)预测模型提供更准确的结果。
本研究旨在比较使用 EHR 数据进行 10 年死亡率预测的机器学习和传统回归模型。
这是一项队列研究。
退伍军人事务部(VA)的 EHR 数据。
年龄在 50 岁以上,在 2005 年有过一次初级保健就诊的退伍军人,将其分为单独的训练和测试队列(每组 124360 人)。
主要结局为 10 年全因死亡率。我们考虑了 924 个潜在预测因子,涵盖了广泛的 EHR 数据元素,包括人口统计学(3)、生命体征(9)、药物类别(399)、疾病诊断(293)、实验室结果(71)和医疗保健利用情况(149)。我们比较了机器学习和回归模型的判别能力(c 统计量)、校准指标和诊断测试特征(敏感性、特异性、阳性和阴性预测值)。
我们的队列平均年龄(标准差)为 68.2(10.5)岁,93.9%为男性;39.4%的人在 10 年内死亡。模型在测试队列中的 c 统计量在 0.827 到 0.837 之间。利用所有 924 个预测因子,梯度提升模型产生了最高的 c 统计量[0.837,95%置信区间(CI):0.835-0.839]。全(未选择)逻辑回归模型具有最高的回归模型 c 统计量(0.833,95%CI:0.830-0.835),但存在过度拟合的证据。逐步选择逻辑模型(101 个预测因子)的判别能力相似(0.832,95%CI:0.830-0.834),且过度拟合程度最小。所有模型均具有良好的校准度和相似的诊断测试特征。
我们的研究结果需要在非 VA 的 EHR 中得到证实。
对于 10 年死亡率预测,最佳机器学习模型(924 个预测因子的梯度提升)和 101 个预测因子的逐步逻辑模型之间的 c 统计量差异较小,这表明逐步回归方法仍然是 VA EHR 死亡率预测模型开发的合理方法。