利用电子病历中的行政索赔数据进行机器学习方法与传统模型预测心力衰竭结局的比较。

Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.

机构信息

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.

Heart and Vascular Center, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.

出版信息

JAMA Netw Open. 2020 Jan 3;3(1):e1918962. doi: 10.1001/jamanetworkopen.2019.18962.

DOI:10.1001/jamanetworkopen.2019.18962

PMID:31922560

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6991258/

Abstract

IMPORTANCE

Accurate risk stratification of patients with heart failure (HF) is critical to deploy targeted interventions aimed at improving patients' quality of life and outcomes.

OBJECTIVES

To compare machine learning approaches with traditional logistic regression in predicting key outcomes in patients with HF and evaluate the added value of augmenting claims-based predictive models with electronic medical record (EMR)-derived information.

DESIGN, SETTING, AND PARTICIPANTS: A prognostic study with a 1-year follow-up period was conducted including 9502 Medicare-enrolled patients with HF from 2 health care provider networks in Boston, Massachusetts ("providers" includes physicians, clinicians, other health care professionals, and their institutions that comprise the networks). The study was performed from January 1, 2007, to December 31, 2014; data were analyzed from January 1 to December 31, 2018.

MAIN OUTCOMES AND MEASURES

All-cause mortality, HF hospitalization, top cost decile, and home days loss greater than 25% were modeled using logistic regression, least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient-boosted modeling (GBM). All models were trained using data from network 1 and tested in network 2. After selecting the most efficient modeling approach based on discrimination, Brier score, and calibration, area under precision-recall curves (AUPRCs) and net benefit estimates from decision curves were calculated to focus on the differences when using claims-only vs claims + EMR predictors.

RESULTS

A total of 9502 patients with HF with a mean (SD) age of 78 (8) years were included: 6113 from network 1 (training set) and 3389 from network 2 (testing set). Gradient-boosted modeling consistently provided the highest discrimination, lowest Brier scores, and good calibration across all 4 outcomes; however, logistic regression had generally similar performance (C statistics for logistic regression based on claims-only predictors: mortality, 0.724; 95% CI, 0.705-0.744; HF hospitalization, 0.707; 95% CI, 0.676-0.737; high cost, 0.734; 95% CI, 0.703-0.764; and home days loss claims only, 0.781; 95% CI, 0.764-0.798; C statistics for GBM: mortality, 0.727; 95% CI, 0.708-0.747; HF hospitalization, 0.745; 95% CI, 0.718-0.772; high cost, 0.733; 95% CI, 0.703-0.763; and home days loss, 0.790; 95% CI, 0.773-0.807). Higher AUPRCs were obtained for claims + EMR vs claims-only GBMs predicting mortality (0.484 vs 0.423), HF hospitalization (0.413 vs 0.403), and home time loss (0.575 vs 0.521) but not cost (0.249 vs 0.252). The net benefit for claims + EMR vs claims-only GBMs was higher at various threshold probabilities for mortality and home time loss outcomes but similar for the other 2 outcomes.

CONCLUSIONS AND RELEVANCE

Machine learning methods offered only limited improvement over traditional logistic regression in predicting key HF outcomes. Inclusion of additional predictors from EMRs to claims-based models appeared to improve prediction for some, but not all, outcomes.

摘要

重要性

准确的心力衰竭（HF）风险分层对于部署针对改善患者生活质量和预后的靶向干预措施至关重要。

目的

比较机器学习方法与传统逻辑回归在预测 HF 患者关键结局方面的表现，并评估在基于索赔的预测模型中增加电子病历（EMR）衍生信息的附加价值。

设计、地点和参与者：这是一项预后研究，随访期为 1 年，纳入了来自马萨诸塞州波士顿的 2 个医疗保健提供者网络的 9502 名医疗保险登记的 HF 患者（提供者包括医生、临床医生、其他医疗保健专业人员以及构成网络的他们的机构）。该研究于 2007 年 1 月 1 日至 2014 年 12 月 31 日进行；数据分析于 2018 年 1 月 1 日至 12 月 31 日进行。

主要结局和测量

使用逻辑回归、最小绝对收缩和选择操作回归、分类和回归树、随机森林和梯度提升建模（GBM）对全因死亡率、HF 住院、最高费用十分位数和 home days loss 大于 25%进行建模。所有模型均使用网络 1 中的数据进行训练，并在网络 2 中进行测试。在基于判别、Brier 得分和校准选择最有效的建模方法后，计算精度-召回曲线下面积（AUPRC）和决策曲线的净收益，以重点关注仅使用索赔数据与同时使用索赔和 EMR 预测因子的差异。

结果

共纳入 9502 名 HF 患者，平均（SD）年龄为 78（8）岁：6113 名来自网络 1（训练集），3389 名来自网络 2（测试集）。梯度提升建模在所有 4 个结局中均提供了最高的判别力、最低的 Brier 得分和良好的校准；然而，逻辑回归的性能通常相似（基于仅索赔预测因子的逻辑回归的 C 统计量：死亡率，0.724；95%CI，0.705-0.744；HF 住院率，0.707；95%CI，0.676-0.737；高费用，0.734；95%CI，0.703-0.764；home days loss 仅索赔，0.781；95%CI，0.764-0.798；基于 GBM 的 C 统计量：死亡率，0.727；95%CI，0.708-0.747；HF 住院率，0.745；95%CI，0.718-0.772；高费用，0.733；95%CI，0.703-0.763；home days loss，0.790；95%CI，0.773-0.807）。与仅索赔 GBM 相比，使用索赔和 EMR 预测死亡率（0.484 比 0.423）、HF 住院率（0.413 比 0.403）和 home time loss（0.575 比 0.521）的 GBM 的 AUPRC 更高，但成本（0.249 比 0.252）除外。与仅索赔 GBM 相比，在各种死亡率和 home time loss 结局的阈值概率下，索赔和 EMR 预测因子的净收益更高，但在其他 2 个结局下相似。

结论和相关性

机器学习方法在预测 HF 关键结局方面仅提供了对传统逻辑回归的有限改进。将来自 EMR 的附加预测因子纳入基于索赔的模型似乎提高了某些但不是所有结局的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2579/6991258/d83dd61f7fb0/jamanetwopen-3-e1918962-g001.jpg

相似文献

Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.利用电子病历中的行政索赔数据进行机器学习方法与传统模型预测心力衰竭结局的比较。

JAMA Netw Open. 2020 Jan 3;3(1):e1918962. doi: 10.1001/jamanetworkopen.2019.18962.

Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches.预测因心力衰竭住院患者的 30 天全因再入院率：机器学习与其他统计学方法的比较。

JAMA Cardiol. 2017 Feb 1;2(2):204-209. doi: 10.1001/jamacardio.2016.3956.

Association of Frailty With 30-Day Outcomes for Acute Myocardial Infarction, Heart Failure, and Pneumonia Among Elderly Adults.老年人因急性心肌梗死、心力衰竭和肺炎导致的 30 天结局与衰弱的关系。

JAMA Cardiol. 2019 Nov 1;4(11):1084-1091. doi: 10.1001/jamacardio.2019.3511.

Machine Learning-Based Models Incorporating Social Determinants of Health vs Traditional Models for Predicting In-Hospital Mortality in Patients With Heart Failure.基于机器学习的纳入健康社会决定因素的模型与传统模型在预测心力衰竭患者住院死亡率中的比较。

JAMA Cardiol. 2022 Aug 1;7(8):844-854. doi: 10.1001/jamacardio.2022.1900.

Machine Learning Prediction of Mortality and Hospitalization in Heart Failure With Preserved Ejection Fraction.机器学习预测射血分数保留的心力衰竭患者的死亡率和住院率。

JACC Heart Fail. 2020 Jan;8(1):12-21. doi: 10.1016/j.jchf.2019.06.013. Epub 2019 Oct 9.

Machine Learning-Based Prediction of Clinical Outcomes for Children During Emergency Department Triage.基于机器学习的急诊科分诊中儿童临床结局预测。

JAMA Netw Open. 2019 Jan 4;2(1):e186937. doi: 10.1001/jamanetworkopen.2018.6937.

Evaluation of Machine-Learning Algorithms for Predicting Opioid Overdose Risk Among Medicare Beneficiaries With Opioid Prescriptions.评估机器学习算法在预测有阿片类药物处方的医疗保险受益人群中阿片类药物过量风险中的应用。

JAMA Netw Open. 2019 Mar 1;2(3):e190968. doi: 10.1001/jamanetworkopen.2019.0968.

Applying machine learning approaches for predicting obesity risk using US health administrative claims database.应用机器学习方法，利用美国健康管理数据库预测肥胖风险。

BMJ Open Diabetes Res Care. 2024 Sep 26;12(5):e004193. doi: 10.1136/bmjdrc-2024-004193.

Comparative Effectiveness of New Approaches to Improve Mortality Risk Models From Medicare Claims Data.改善 Medicare 索赔数据中死亡率风险模型的新方法的比较效果。

JAMA Netw Open. 2019 Jul 3;2(7):e197314. doi: 10.1001/jamanetworkopen.2019.7314.

Development and validation of 15-month mortality prediction models: a retrospective observational comparison of machine-learning techniques in a national sample of Medicare recipients.开发和验证 15 个月死亡率预测模型：对全国医疗保险接受者样本中机器学习技术的回顾性观察比较。

BMJ Open. 2019 Jul 16;9(7):e022935. doi: 10.1136/bmjopen-2018-022935.

引用本文的文献

Development and Validation of a Machine Learning Model for Early Prediction of Acute Kidney Injury in Neurocritical Care: A Comparative Analysis of XGBoost, GBM, and Random Forest Algorithms.用于神经重症监护中急性肾损伤早期预测的机器学习模型的开发与验证：XGBoost、GBM和随机森林算法的比较分析

Diagnostics (Basel). 2025 Aug 17;15(16):2061. doi: 10.3390/diagnostics15162061.

Magnitude and Impact of Hallucinations in Tabular Synthetic Health Data on Prognostic Machine Learning Models: Validation Study.表格合成健康数据中的幻觉对预后机器学习模型的影响程度及验证研究

J Med Internet Res. 2025 Aug 18;27:e77893. doi: 10.2196/77893.

Evaluation of Machine Learning-Based Propensity Score Estimation: A Benchmarking Observational Analysis Against a Randomized Trial.基于机器学习的倾向得分估计评估：针对随机试验的基准观察性分析

medRxiv. 2025 Jun 17:2025.06.16.25329708. doi: 10.1101/2025.06.16.25329708.

A machine learning model for predicting severity-adjusted in-hospital mortality in pneumonia patients.一种用于预测肺炎患者严重程度调整后的院内死亡率的机器学习模型。

Digit Health. 2025 Jun 16;11:20552076251351467. doi: 10.1177/20552076251351467. eCollection 2025 Jan-Dec.

Failure modes and mitigations for Bayesian optimization of neuromodulation parameters.神经调节参数贝叶斯优化的失效模式及缓解措施。

J Neural Eng. 2025 Jun 13;22(3):036038. doi: 10.1088/1741-2552/ade189.

Development and Validation of Comorbidity Severity Adjustment Methods in Mortality Models for Acute Cerebrovascular Disease Using Survival and Machine Learning Analyses.使用生存分析和机器学习分析的急性脑血管病死亡率模型中共病严重程度调整方法的开发与验证

J Clin Med. 2025 May 8;14(10):3281. doi: 10.3390/jcm14103281.

Integrating Remote Patient Monitoring Data into Machine Learning Models for Predicting Emergency Department Utilization.将远程患者监测数据整合到用于预测急诊科利用率的机器学习模型中。

AMIA Annu Symp Proc. 2025 May 22;2024:398-406. eCollection 2024.

Artificial Intelligence in Cardiovascular Imaging and Interventional Cardiology: Emerging Trends and Clinical Implications.心血管成像与介入心脏病学中的人工智能：新兴趋势与临床意义。

J Soc Cardiovasc Angiogr Interv. 2025 Mar 18;4(3Part B):102558. doi: 10.1016/j.jscai.2024.102558. eCollection 2025 Mar.

Evaluation of machine learning methods for prediction of heart failure mortality and readmission: meta-analysis.用于预测心力衰竭死亡率和再入院的机器学习方法评估：荟萃分析

BMC Cardiovasc Disord. 2025 Apr 7;25(1):264. doi: 10.1186/s12872-025-04700-0.

Predicting all-cause mortality with machine learning among Brazilians aged 50 and over: results from The Brazilian Longitudinal Study of Ageing (ELSI-Brazil).利用机器学习预测巴西50岁及以上人群的全因死亡率：巴西衰老纵向研究（ELSI-巴西）的结果。

NPJ Aging. 2025 Mar 28;11(1):22. doi: 10.1038/s41514-025-00210-7.

本文引用的文献

Scalable and accurate deep learning with electronic health records.借助电子健康记录实现可扩展且准确的深度学习。

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.系统评价显示，机器学习在临床预测模型中并未优于逻辑回归。

J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.

Evaluation of Socioeconomic Status Indicators for Confounding Adjustment in Observational Studies of Medication Use.评价药物使用观察性研究中混杂因素调整的社会经济地位指标。

Clin Pharmacol Ther. 2019 Jun;105(6):1513-1521. doi: 10.1002/cpt.1348. Epub 2019 Feb 25.

Home Time as a Patient-Centered Outcome in Administrative Claims Data.家庭时间作为行政索赔数据中的以患者为中心的结果。

J Am Geriatr Soc. 2019 Feb;67(2):347-351. doi: 10.1111/jgs.15705. Epub 2018 Dec 21.

Home-Time After Discharge Among Patients Hospitalized With Heart Failure.心力衰竭患者出院后的居家时间。

J Am Coll Cardiol. 2018 Jun 12;71(23):2643-2652. doi: 10.1016/j.jacc.2018.03.517.

Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association.《2018年心脏病和中风统计数据更新：美国心脏协会报告》

Circulation. 2018 Mar 20;137(12):e67-e492. doi: 10.1161/CIR.0000000000000558. Epub 2018 Jan 31.

Measuring Frailty in Medicare Data: Development and Validation of a Claims-Based Frailty Index.在 Medicare 数据中测量虚弱程度：基于索赔的虚弱指数的开发和验证。

J Gerontol A Biol Sci Med Sci. 2018 Jun 14;73(7):980-987. doi: 10.1093/gerona/glx229.

JAMA Cardiol. 2017 Feb 1;2(2):204-209. doi: 10.1001/jamacardio.2016.3956.

Executive Summary: Heart Disease and Stroke Statistics--2016 Update: A Report From the American Heart Association.执行摘要：《2016年心脏病和中风统计数据更新：美国心脏协会报告》

Circulation. 2016 Jan 26;133(4):447-54. doi: 10.1161/CIR.0000000000000366.

The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.在不平衡数据集上评估二元分类器时，精确率-召回率曲线比ROC曲线更具信息性。

PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用电子病历中的行政索赔数据进行机器学习方法与传统模型预测心力衰竭结局的比较。

Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.

机构信息

出版信息

IMPORTANCE

OBJECTIVES

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

主要结局和测量

结果

结论和相关性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献