Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK; Clinical Effectiveness Unit, Royal College of Surgeons of England, Lincoln's Inn Fields, London WC2A 3PE, UK.
Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK; Clinical Effectiveness Unit, Royal College of Surgeons of England, Lincoln's Inn Fields, London WC2A 3PE, UK.
J Clin Epidemiol. 2021 May;133:43-52. doi: 10.1016/j.jclinepi.2020.12.018. Epub 2021 Jan 22.
The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.
We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015-2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.
One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.
In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.
本研究旨在比较逻辑回归和提升树在预测电子医疗记录中大量诊断代码的患者死亡率方面的性能。
我们分析了 2015-2017 年英格兰患有心肌梗死(n=200119)、髋部骨折(n=169646)或结直肠手术(n=56515)的全国医院记录和官方死亡记录。使用患者年龄、性别和社会经济地位以及前一年记录的 202 至 257 个国际疾病分类第 10 版代码(二进制预测因子)预测一年死亡率。性能指标包括 C 统计量、缩放 Brier 评分以及几种校准度量。
心肌梗死后一年死亡率为 17.2%(34520 人),髋部骨折后一年死亡率为 27.2%(46115 人),结直肠手术后一年死亡率为 9.3%(5273 人)。逻辑回归模型的校正后 C 统计量为 0.884(95%置信区间[CI]:0.882,0.886)、0.798(0.796,0.800)和 0.811(0.805,0.817)。提升树模型的等效 C 统计量为 0.891(95% CI:0.889,0.892)、0.804(0.802,0.806)和 0.803(0.797,0.809)。使用缩放 Brier 评分衡量时,模型性能也相似。总体而言,所有模型的校准效果都很好。
在大型电子医疗记录数据集,逻辑回归和提升树模型使用大量诊断代码预测患者死亡率表现相当。