使用可解释机器学习算法早期检测脓毒症休克发作

Early Detection of Septic Shock Onset Using Interpretable Machine Learners.

作者信息

Misra Debdipto, Avula Venkatesh, Wolk Donna M, Farag Hosam A, Li Jiang, Mehta Yatin B, Sandhu Ranjeet, Karunakaran Bipin, Kethireddy Shravan, Zand Ramin, Abedi Vida

机构信息

Steele Institute for Health Innovation, Geisinger Health System, Danville, PA 17822, USA.

Department of Molecular and Functional Genomics, Geisinger Health System, Danville, PA 17822, USA.

出版信息

J Clin Med. 2021 Jan 15;10(2):301. doi: 10.3390/jcm10020301.

DOI:10.3390/jcm10020301

PMID:33467539

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7830968/

Abstract

BACKGROUND

Developing a decision support system based on advances in machine learning is one area for strategic innovation in healthcare. Predicting a patient's progression to septic shock is an active field of translational research. The goal of this study was to develop a working model of a clinical decision support system for predicting septic shock in an acute care setting for up to 6 h from the time of admission in an integrated healthcare setting.

METHOD

Clinical data from Electronic Health Record (EHR), at encounter level, were used to build a predictive model for progression from sepsis to septic shock up to 6 h from the time of admission; that is, , , and from admission. Eight different machine learning algorithms (Random Forest, XGBoost, C5.0, Decision Trees, Boosted Logistic Regression, Support Vector Machine, Logistic Regression, Regularized Logistic, and Bayes Generalized Linear Model) were used for model development. Two adaptive sampling strategies were used to address the class imbalance. Data from two sources (clinical and billing codes) were used to define the case definition (septic shock) using the Centers for Medicare & Medicaid Services (CMS) Sepsis criteria. The model assessment was performed using Area under Receiving Operator Characteristics (AUROC), sensitivity, and specificity. Model predictions for each feature window (1, 3 and 6 h from admission) were consolidated.

RESULTS

Retrospective data from April 2005 to September 2018 were extracted from the EHR, Insurance Claims, Billing, and Laboratory Systems to create a dataset for septic shock detection. The clinical criteria and billing information were used to label patients into two classes-septic shock patients and sepsis patients at three different time points from admission, creating two different case-control cohorts. Data from 45,425 unique in-patient visits were used to build 96 prediction models comparing clinical-based definition versus billing-based information as the gold standard. Of the 24 consolidated models (based on eight machine learning algorithms and three feature windows), four models reached an AUROC greater than 0.9. Overall, all the consolidated models reached an AUROC of at least 0.8820 or higher. Based on the AUROC of 0.9483, the best model was based on Random Forest, with a sensitivity of 83.9% and specificity of 88.1%. The sepsis detection window at 6 h outperformed the 1 and 3-h windows. The sepsis definition based on clinical variables had improved performance when compared to the sepsis definition based on only billing information.

CONCLUSION

This study corroborated that machine learning models can be developed to predict septic shock using clinical and administrative data. However, the use of clinical information to define septic shock outperformed models developed based on only administrative data. Intelligent decision support tools can be developed and integrated into the EHR and improve clinical outcomes and facilitate the optimization of resources in real-time.

摘要

背景

基于机器学习进展开发决策支持系统是医疗保健领域战略创新的一个方向。预测患者进展为感染性休克是转化研究的一个活跃领域。本研究的目的是开发一种临床决策支持系统的工作模型，用于在综合医疗环境中从入院时起长达6小时的急性护理环境中预测感染性休克。

方法

使用电子健康记录（EHR）中就诊级别的临床数据，建立从入院起长达6小时（即入院后1小时、3小时和6小时）从脓毒症进展为感染性休克的预测模型。八种不同的机器学习算法（随机森林、XGBoost、C5.0、决策树、增强逻辑回归、支持向量机、逻辑回归、正则化逻辑回归和贝叶斯广义线性模型）用于模型开发。使用两种自适应采样策略来解决类别不平衡问题。来自两个来源（临床和计费代码）的数据用于根据医疗保险和医疗补助服务中心（CMS）的脓毒症标准定义病例定义（感染性休克）。使用接受者操作特征曲线下面积（AUROC）、敏感性和特异性进行模型评估。对每个特征窗口（入院后1小时、3小时和6小时）的模型预测进行汇总。

结果

从EHR、保险理赔、计费和实验室系统中提取了2005年4月至2018年9月的数据，以创建用于感染性休克检测的数据集。临床标准和计费信息用于在入院后的三个不同时间点将患者分为两类——感染性休克患者和脓毒症患者，从而创建两个不同的病例对照队列。来自45425次独特住院就诊的数据用于构建96个预测模型，比较基于临床的定义与基于计费的信息作为金标准。在24个汇总模型（基于八种机器学习算法和三个特征窗口）中，四个模型的AUROC大于0.9。总体而言，所有汇总模型的AUROC至少为0.8820或更高。基于0.9483的AUROC，最佳模型基于随机森林，敏感性为83.9%，特异性为88.1%。6小时时的脓毒症检测窗口优于1小时和3小时窗口。与仅基于计费信息的脓毒症定义相比，基于临床变量的脓毒症定义性能有所提高。