使用全电子病历机器学习对医院再入院率进行预测建模：以西奈山心力衰竭队列为例的研究

PREDICTIVE MODELING OF HOSPITAL READMISSION RATES USING ELECTRONIC MEDICAL RECORD-WIDE MACHINE LEARNING: A CASE-STUDY USING MOUNT SINAI HEART FAILURE COHORT.

作者信息

Shameer Khader, Johnson Kipp W, Yahi Alexandre, Miotto Riccardo, Li L I, Ricks Doran, Jebakaran Jebakumar, Kovatch Patricia, Sengupta Partho P, Gelijns Sengupta, Moskovitz Alan, Darrow Bruce, David David L, Kasarskis Andrew, Tatonetti Nicholas P, Pinney Sean, Dudley Joel T

机构信息

Department of Genetics and Genomics, Icahn Institute of Genomics and Multiscale Biology, New York, NY, USA2Institute of Next Generation Healthcare, Mount Sinai Health System, New York, NY, USA.

出版信息

Pac Symp Biocomput. 2017;22:276-287. doi: 10.1142/9789813207813_0027.

DOI:10.1142/9789813207813_0027

PMID:27896982

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5362124/

Abstract

Reduction of preventable hospital readmissions that result from chronic or acute conditions like stroke, heart failure, myocardial infarction and pneumonia remains a significant challenge for improving the outcomes and decreasing the cost of healthcare delivery in the United States. Patient readmission rates are relatively high for conditions like heart failure (HF) despite the implementation of high-quality healthcare delivery operation guidelines created by regulatory authorities. Multiple predictive models are currently available to evaluate potential 30-day readmission rates of patients. Most of these models are hypothesis driven and repetitively assess the predictive abilities of the same set of biomarkers as predictive features. In this manuscript, we discuss our attempt to develop a data-driven, electronic-medical record-wide (EMR-wide) feature selection approach and subsequent machine learning to predict readmission probabilities. We have assessed a large repertoire of variables from electronic medical records of heart failure patients in a single center. The cohort included 1,068 patients with 178 patients were readmitted within a 30-day interval (16.66% readmission rate). A total of 4,205 variables were extracted from EMR including diagnosis codes (n=1,763), medications (n=1,028), laboratory measurements (n=846), surgical procedures (n=564) and vital signs (n=4). We designed a multistep modeling strategy using the Naïve Bayes algorithm. In the first step, we created individual models to classify the cases (readmitted) and controls (non-readmitted). In the second step, features contributing to predictive risk from independent models were combined into a composite model using a correlation-based feature selection (CFS) method. All models were trained and tested using a 5-fold cross-validation method, with 70% of the cohort used for training and the remaining 30% for testing. Compared to existing predictive models for HF readmission rates (AUCs in the range of 0.6-0.7), results from our EMR-wide predictive model (AUC=0.78; Accuracy=83.19%) and phenome-wide feature selection strategies are encouraging and reveal the utility of such datadriven machine learning. Fine tuning of the model, replication using multi-center cohorts and prospective clinical trial to evaluate the clinical utility would help the adoption of the model as a clinical decision system for evaluating readmission status.

摘要

减少由中风、心力衰竭、心肌梗死和肺炎等慢性或急性疾病导致的可预防的医院再入院率，仍然是美国改善医疗结果和降低医疗服务成本的一项重大挑战。尽管监管机构制定了高质量的医疗服务操作指南，但心力衰竭（HF）等疾病的患者再入院率相对较高。目前有多种预测模型可用于评估患者30天的潜在再入院率。这些模型大多是假设驱动的，并反复评估同一组生物标志物作为预测特征的预测能力。在本论文中，我们讨论了我们尝试开发一种数据驱动的、全电子病历（EMR-wide）的特征选择方法以及后续机器学习来预测再入院概率的过程。我们评估了来自单一中心心力衰竭患者电子病历的大量变量。该队列包括1068名患者，其中178名患者在30天内再次入院（再入院率为16.66%）。从电子病历中总共提取了4205个变量，包括诊断代码（n = 1763）、药物（n = 1028）、实验室测量值（n = 846）、手术程序（n = 564）和生命体征（n = 4）。我们使用朴素贝叶斯算法设计了一种多步骤建模策略。第一步，我们创建个体模型来对病例（再入院）和对照（未再入院）进行分类。第二步，使用基于相关性的特征选择（CFS）方法将独立模型中对预测风险有贡献的特征组合成一个复合模型。所有模型均使用5折交叉验证方法进行训练和测试，队列的70%用于训练，其余30%用于测试。与现有的HF再入院率预测模型（AUC范围为0.6 - 0.7）相比，我们的全电子病历预测模型（AUC = 0.78；准确率 = 83.19%）和全表型特征选择策略的结果令人鼓舞，并揭示了这种数据驱动的机器学习的效用。对模型进行微调、使用多中心队列进行复制以及进行前瞻性临床试验以评估临床效用，将有助于该模型作为评估再入院状态的临床决策系统被采用。

相似文献

PREDICTIVE MODELING OF HOSPITAL READMISSION RATES USING ELECTRONIC MEDICAL RECORD-WIDE MACHINE LEARNING: A CASE-STUDY USING MOUNT SINAI HEART FAILURE COHORT.使用全电子病历机器学习对医院再入院率进行预测建模：以西奈山心力衰竭队列为例的研究

Pac Symp Biocomput. 2017;22:276-287. doi: 10.1142/9789813207813_0027.

Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches.预测因心力衰竭住院患者的 30 天全因再入院率：机器学习与其他统计学方法的比较。

JAMA Cardiol. 2017 Feb 1;2(2):204-209. doi: 10.1001/jamacardio.2016.3956.

Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death.机器学习的特征选择和转换减少了变量数量，并提高了心力衰竭再入院或死亡的预测准确性。

PLoS One. 2019 Jun 26;14(6):e0218760. doi: 10.1371/journal.pone.0218760. eCollection 2019.

A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data.机器学习模型预测心力衰竭患者 30 天再入院风险：电子病历数据的回顾性分析。

BMC Med Inform Decis Mak. 2018 Jun 22;18(1):44. doi: 10.1186/s12911-018-0620-z.

An automated model to identify heart failure patients at risk for 30-day readmission or death using electronic medical record data.利用电子病历数据建立自动模型识别 30 天内再入院或死亡风险的心力衰竭患者。

Med Care. 2010 Nov;48(11):981-8. doi: 10.1097/MLR.0b013e3181ef60d9.

Pharmacological risk factors associated with hospital readmission rates in a psychiatric cohort identified using prescriptome data mining.基于处方组学数据挖掘识别的精神科队列中与医院再入院率相关的药物性风险因素。

BMC Med Inform Decis Mak. 2018 Sep 14;18(Suppl 3):79. doi: 10.1186/s12911-018-0653-3.

Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review.电子病历在医院再入院风险预测模型的开发和验证中的应用：系统评价。

BMJ. 2020 Apr 8;369:m958. doi: 10.1136/bmj.m958.

Electronic medical record-based multicondition models to predict the risk of 30 day readmission or death among adult medicine patients: validation and comparison to existing models.基于电子病历的多病情模型预测成年内科患者30天再入院或死亡风险：验证及与现有模型比较

BMC Med Inform Decis Mak. 2015 May 20;15:39. doi: 10.1186/s12911-015-0162-6.

Diagnosis-specific readmission risk prediction using electronic health data: a retrospective cohort study.利用电子健康数据进行特定诊断再入院风险预测：一项回顾性队列研究。

BMC Med Inform Decis Mak. 2014 Aug 4;14:65. doi: 10.1186/1472-6947-14-65.

引用本文的文献

Bringing Precision to Pediatric Care: Explainable AI in Predicting No-Show Trends Before and During the COVID-19 Pandemic.为儿科护理带来精准性：在预测COVID-19大流行之前及期间的爽约趋势方面的可解释人工智能

Bioengineering (Basel). 2025 Feb 24;12(3):227. doi: 10.3390/bioengineering12030227.

Predictive Analytics in Heart Failure Risk, Readmission, and Mortality Prediction: A Review.心力衰竭风险、再入院和死亡率预测中的预测分析：综述

Cureus. 2024 Nov 17;16(11):e73876. doi: 10.7759/cureus.73876. eCollection 2024 Nov.

Explainable predictions of a machine learning model to forecast the postoperative length of stay for severe patients: machine learning model development and evaluation.机器学习模型预测重症患者术后住院时间的可解释性：机器学习模型的开发和评估。

BMC Med Inform Decis Mak. 2024 Nov 20;24(1):350. doi: 10.1186/s12911-024-02755-1.

Development, evaluation and comparison of machine learning algorithms for predicting in-hospital patient charges for congestive heart failure exacerbations, chronic obstructive pulmonary disease exacerbations and diabetic ketoacidosis.用于预测充血性心力衰竭加重、慢性阻塞性肺疾病加重和糖尿病酮症酸中毒患者住院费用的机器学习算法的开发、评估与比较。

BioData Min. 2024 Sep 12;17(1):35. doi: 10.1186/s13040-024-00387-9.

"Using network analysis modularity to group health code systems and decrease dimensionality in machine learning models".利用网络分析模块度对健康码系统进行分组并降低机器学习模型的维度

Explor Res Clin Soc Pharm. 2024 Jun 11;14:100463. doi: 10.1016/j.rcsop.2024.100463. eCollection 2024 Jun.

Development and Optimization of Machine Learning Algorithms for Predicting In-hospital Patient Charges for Congestive Heart Failure Exacerbations, Chronic Obstructive Pulmonary Disease Exacerbations and Diabetic Ketoacidosis.用于预测充血性心力衰竭加重、慢性阻塞性肺疾病加重和糖尿病酮症酸中毒住院患者费用的机器学习算法的开发与优化

Res Sq. 2024 Jun 13:rs.3.rs-4490027. doi: 10.21203/rs.3.rs-4490027/v1.

Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source.基于机器学习的大型索赔数据源中先天性心脏病诊断

JACC Adv. 2023 Dec 25;3(2):100801. doi: 10.1016/j.jacadv.2023.100801. eCollection 2024 Feb.

Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or death.针对出院安置进行可解释（不仅是事后可解释）的医疗索赔建模，以降低可预防的全因再入院或死亡。

PLoS One. 2024 May 9;19(5):e0302871. doi: 10.1371/journal.pone.0302871. eCollection 2024.

A machine learning model to predict heart failure readmission: toward optimal feature set.一种预测心力衰竭再入院的机器学习模型：迈向最优特征集

Front Artif Intell. 2024 Feb 21;7:1363226. doi: 10.3389/frai.2024.1363226. eCollection 2024.

Analysis of the survival time of patients with heart failure with reduced ejection fraction: a Bayesian approach via a competing risk parametric model.射血分数降低的心力衰竭患者生存时间分析：基于竞争风险参数模型的贝叶斯方法。

BMC Cardiovasc Disord. 2024 Jan 13;24(1):45. doi: 10.1186/s12872-023-03685-y.

本文引用的文献

Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks.电子病历中人群规模表型组数据的比较分析揭示了种族特异性疾病网络。

Bioinformatics. 2016 Jun 15;32(12):i101-i110. doi: 10.1093/bioinformatics/btw282.

Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy.用于心脏成像的认知机器学习算法：区分缩窄性心包炎与限制性心肌病的初步研究

Circ Cardiovasc Imaging. 2016 Jun;9(6). doi: 10.1161/CIRCIMAGING.115.004330.

Data-Driven Identification of Risk Factors of Patient Satisfaction at a Large Urban Academic Medical Center.大型城市学术医疗中心患者满意度风险因素的数据驱动识别

PLoS One. 2016 May 26;11(5):e0156076. doi: 10.1371/journal.pone.0156076. eCollection 2016.

Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.深度患者：一种从电子健康记录中预测患者未来的无监督表示。

Sci Rep. 2016 May 17;6:26094. doi: 10.1038/srep26094.

EHDViz: clinical dashboard development using open-source technologies.EHDViz：使用开源技术进行临床仪表板开发。

BMJ Open. 2016 Mar 24;6(3):e010579. doi: 10.1136/bmjopen-2015-010579.

Introducing Machine Learning Concepts with WEKA.使用WEKA介绍机器学习概念。

Methods Mol Biol. 2016;1418:353-78. doi: 10.1007/978-1-4939-3578-9_17.

Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams.实时生物医学、医疗保健与健康数据流时代的转化生物信息学。

Brief Bioinform. 2017 Jan;18(1):105-124. doi: 10.1093/bib/bbv118. Epub 2016 Feb 14.

A comparison of models for predicting early hospital readmissions.预测早期医院再入院的模型比较。

J Biomed Inform. 2015 Aug;56:229-38. doi: 10.1016/j.jbi.2015.05.016. Epub 2015 Jun 1.

Joint impact of clinical and behavioral variables on the risk of unplanned readmission and death after a heart failure hospitalization.心力衰竭住院后临床和行为变量对计划外再入院和死亡风险的联合影响。

PLoS One. 2015 Jun 4;10(6):e0129553. doi: 10.1371/journal.pone.0129553. eCollection 2015.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验