Taylor R Andrew, Pare Joseph R, Venkatesh Arjun K, Mowafi Hani, Melnick Edward R, Fleischman William, Hall M Kennedy
Department of Emergency Medicine, Yale University, Yale-New Haven Hospital, New Haven, CT.
Acad Emerg Med. 2016 Mar;23(3):269-78. doi: 10.1111/acem.12876. Epub 2016 Feb 13.
Predictive analytics in emergency care has mostly been limited to the use of clinical decision rules (CDRs) in the form of simple heuristics and scoring systems. In the development of CDRs, limitations in analytic methods and concerns with usability have generally constrained models to a preselected small set of variables judged to be clinically relevant and to rules that are easily calculated. Furthermore, CDRs frequently suffer from questions of generalizability, take years to develop, and lack the ability to be updated as new information becomes available. Newer analytic and machine learning techniques capable of harnessing the large number of variables that are already available through electronic health records (EHRs) may better predict patient outcomes and facilitate automation and deployment within clinical decision support systems. In this proof-of-concept study, a local, big data-driven, machine learning approach is compared to existing CDRs and traditional analytic methods using the prediction of sepsis in-hospital mortality as the use case.
This was a retrospective study of adult ED visits admitted to the hospital meeting criteria for sepsis from October 2013 to October 2014. Sepsis was defined as meeting criteria for systemic inflammatory response syndrome with an infectious admitting diagnosis in the ED. ED visits were randomly partitioned into an 80%/20% split for training and validation. A random forest model (machine learning approach) was constructed using over 500 clinical variables from data available within the EHRs of four hospitals to predict in-hospital mortality. The machine learning prediction model was then compared to a classification and regression tree (CART) model, logistic regression model, and previously developed prediction tools on the validation data set using area under the receiver operating characteristic curve (AUC) and chi-square statistics.
There were 5,278 visits among 4,676 unique patients who met criteria for sepsis. Of the 4,222 patients in the training group, 210 (5.0%) died during hospitalization, and of the 1,056 patients in the validation group, 50 (4.7%) died during hospitalization. The AUCs with 95% confidence intervals (CIs) for the different models were as follows: random forest model, 0.86 (95% CI = 0.82 to 0.90); CART model, 0.69 (95% CI = 0.62 to 0.77); logistic regression model, 0.76 (95% CI = 0.69 to 0.82); CURB-65, 0.73 (95% CI = 0.67 to 0.80); MEDS, 0.71 (95% CI = 0.63 to 0.77); and mREMS, 0.72 (95% CI = 0.65 to 0.79). The random forest model AUC was statistically different from all other models (p ≤ 0.003 for all comparisons).
In this proof-of-concept study, a local big data-driven, machine learning approach outperformed existing CDRs as well as traditional analytic techniques for predicting in-hospital mortality of ED patients with sepsis. Future research should prospectively evaluate the effectiveness of this approach and whether it translates into improved clinical outcomes for high-risk sepsis patients. The methods developed serve as an example of a new model for predictive analytics in emergency care that can be automated, applied to other clinical outcomes of interest, and deployed in EHRs to enable locally relevant clinical predictions.
急诊护理中的预测分析大多局限于以简单启发式方法和评分系统形式存在的临床决策规则(CDR)。在CDR的开发过程中,分析方法的局限性以及对可用性的担忧通常将模型限制在预先选定的一小部分被认为具有临床相关性的变量以及易于计算的规则上。此外,CDR经常受到可推广性问题的困扰,开发需要数年时间,并且缺乏随着新信息可用而更新的能力。能够利用通过电子健康记录(EHR)已经可用的大量变量的更新分析和机器学习技术,可能会更好地预测患者预后,并促进临床决策支持系统中的自动化和部署。在这项概念验证研究中,将一种本地的、大数据驱动的机器学习方法与现有的CDR和传统分析方法进行比较,以脓毒症院内死亡率的预测作为应用案例。
这是一项对2013年10月至2014年10月期间因脓毒症入院且符合标准的成年急诊就诊患者的回顾性研究。脓毒症被定义为符合全身炎症反应综合征标准且在急诊科有感染性入院诊断。急诊就诊患者被随机分为80%/20%用于训练和验证。使用来自四家医院EHR中可用数据的500多个临床变量构建随机森林模型(机器学习方法)来预测院内死亡率。然后在验证数据集上,使用受试者操作特征曲线下面积(AUC)和卡方统计量,将机器学习预测模型与分类回归树(CART)模型、逻辑回归模型以及先前开发的预测工具进行比较。
4676名符合脓毒症标准的独特患者中有5278次就诊。训练组的4222名患者中,210名(5.0%)在住院期间死亡,验证组的1056名患者中,50名(4.7%)在住院期间死亡。不同模型的AUC及其95%置信区间(CI)如下:随机森林模型,0.86(95%CI = 0.82至0.90);CART模型,0.69(95%CI = 0.62至0.77);逻辑回归模型,0.76(95%CI = 0.69至0.82);CURB - 65,0.73(95%CI = 0.67至0.80);MEDS,0.71(95%CI = 0.63至0.77);mREMS,0.72(95%CI = 0.65至0.79)。随机森林模型的AUC与所有其他模型在统计学上有差异(所有比较p≤0.003)。
在这项概念验证研究中,一种本地的、大数据驱动的机器学习方法在预测脓毒症急诊患者的院内死亡率方面优于现有的CDR以及传统分析技术。未来的研究应前瞻性地评估这种方法的有效性,以及它是否能转化为高危脓毒症患者改善的临床结局。所开发的方法是急诊护理中预测分析新模型的一个示例,该模型可以自动化,应用于其他感兴趣的临床结局,并部署在EHR中以实现本地相关的临床预测。