逻辑回归和机器学习可以从大型诊断码集中预测患者的死亡率。

Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably.

机构信息

Department of Health Services Research and Policy, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK; Clinical Effectiveness Unit, Royal College of Surgeons of England, Lincoln's Inn Fields, London WC2A 3PE, UK.

出版信息

J Clin Epidemiol. 2021 May;133:43-52. doi: 10.1016/j.jclinepi.2020.12.018. Epub 2021 Jan 22.

DOI:10.1016/j.jclinepi.2020.12.018

PMID:33359319

Abstract

OBJECTIVE

The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.

STUDY DESIGN AND SETTING

We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015-2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.

RESULTS

One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.

CONCLUSION

In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably.

摘要

目的

本研究旨在比较逻辑回归和提升树在预测电子医疗记录中大量诊断代码的患者死亡率方面的性能。

研究设计与设置

我们分析了 2015-2017 年英格兰患有心肌梗死（n=200119）、髋部骨折（n=169646）或结直肠手术（n=56515）的全国医院记录和官方死亡记录。使用患者年龄、性别和社会经济地位以及前一年记录的 202 至 257 个国际疾病分类第 10 版代码（二进制预测因子）预测一年死亡率。性能指标包括 C 统计量、缩放 Brier 评分以及几种校准度量。

结果

心肌梗死后一年死亡率为 17.2%（34520 人），髋部骨折后一年死亡率为 27.2%（46115 人），结直肠手术后一年死亡率为 9.3%（5273 人）。逻辑回归模型的校正后 C 统计量为 0.884（95%置信区间[CI]：0.882，0.886）、0.798（0.796，0.800）和 0.811（0.805，0.817）。提升树模型的等效 C 统计量为 0.891（95% CI：0.889，0.892）、0.804（0.802，0.806）和 0.803（0.797，0.809）。使用缩放 Brier 评分衡量时，模型性能也相似。总体而言，所有模型的校准效果都很好。

结论

在大型电子医疗记录数据集，逻辑回归和提升树模型使用大量诊断代码预测患者死亡率表现相当。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

逻辑回归和机器学习可以从大型诊断码集中预测患者的死亡率。

Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably.

机构信息

出版信息

OBJECTIVE

STUDY DESIGN AND SETTING

RESULTS

CONCLUSION

目的

研究设计与设置

结果

结论

相似文献

引用本文的文献

逻辑回归和机器学习可以从大型诊断码集中预测患者的死亡率。

Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably.

机构信息

出版信息

OBJECTIVE

STUDY DESIGN AND SETTING

RESULTS

CONCLUSION

目的

研究设计与设置

结果

结论

相似文献

引用本文的文献