Hu Chang, Li Lu, Huang Weipeng, Wu Tong, Xu Qiancheng, Liu Juan, Hu Bo
Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, Hubei, China.
Clinical Research Center of Hubei Critical Care Medicine, Wuhan, 430071, Hubei, China.
Infect Dis Ther. 2022 Jun;11(3):1117-1132. doi: 10.1007/s40121-022-00628-6. Epub 2022 Apr 10.
This study aimed to develop and validate an interpretable machine-learning model based on clinical features for early predicting in-hospital mortality in critically ill patients with sepsis.
We enrolled all patients with sepsis in the Medical Information Mart for Intensive Care IV (MIMIC-IV, v.1.0) database from 2008 to 2019. Lasso regression was used for feature selection. Seven machine-learning methods were applied to develop the models. The best model was selected based on its accuracy and area under curve (AUC) in the validation cohort. Furthermore, we employed the SHapley Additive exPlanations (SHAP) method to illustrate the effects of the features attributed to the model, and to analyze how the individual features affect the output of the model, and to visualize the Shapley value for a single individual.
In total, 8,817 patients with sepsis were eligible for participation, the median age was 66.8 years (IQR, 55.9-77.1 years), and 3361 of 8817 participants (38.1%) were women. After selection, 25 of a total 57 clinical parameters collected on day 1 after ICU admission remained associated with prognosis and were used for developing the machine-learning models. Among seven constructed models, the eXtreme Gradient Boosting (XGBoost) model achieved the best performance with an AUC of 0.884 and an accuracy of 89.5% in the validation cohort. Feature importance analysis showed that Glasgow Coma Scale (GCS) score, blood urea nitrogen, respiratory rate, urine output, and age were the top 5 features of the XGBoost model with the greatest impact. Furthermore, SHAP force analysis illustrated how the constructed model visualized the individualized prediction of death.
We have demonstrated the potential of machine-learning approaches for predicting outcome early in patients with sepsis. The SHAP method could improve the interpretability of machine-learning models and help clinicians better understand the reasoning behind the outcome.
本研究旨在开发并验证一种基于临床特征的可解释机器学习模型,用于早期预测脓毒症重症患者的院内死亡率。
我们纳入了医学重症监护信息数据库IV(MIMIC-IV,版本1.0)中2008年至2019年的所有脓毒症患者。采用套索回归进行特征选择。应用七种机器学习方法来开发模型。根据验证队列中的准确性和曲线下面积(AUC)选择最佳模型。此外,我们采用夏普利值附加解释(SHAP)方法来说明模型中特征的影响,分析个体特征如何影响模型输出,并可视化单个个体的夏普利值。
共有8817例脓毒症患者符合参与条件,中位年龄为66.8岁(四分位间距,55.9 - 77.1岁),8817名参与者中有3361名(38.1%)为女性。入选后,在重症监护病房入院后第1天收集的总共57项临床参数中,有25项仍与预后相关,并用于开发机器学习模型。在构建的七个模型中,极端梯度提升(XGBoost)模型在验证队列中表现最佳,AUC为0.884,准确性为89.