National Institute of Health Data Science, Peking University, Beijing, China.
Center for Data Science in Health and Medicine, Peking University, Beijing, China.
BMC Med Inform Decis Mak. 2020 Oct 2;20(1):251. doi: 10.1186/s12911-020-01271-2.
Early and accurate identification of sepsis patients with high risk of in-hospital death can help physicians in intensive care units (ICUs) make optimal clinical decisions. This study aimed to develop machine learning-based tools to predict the risk of hospital death of patients with sepsis in ICUs.
The source database used for model development and validation is the medical information mart for intensive care (MIMIC) III. We identified adult sepsis patients using the new sepsis definition Sepsis-3. A total of 86 predictor variables consisting of demographics, laboratory tests and comorbidities were used. We employed the least absolute shrinkage and selection operator (LASSO), random forest (RF), gradient boosting machine (GBM) and the traditional logistic regression (LR) method to develop prediction models. In addition, the prediction performance of the four developed models was evaluated and compared with that of an existent scoring tool - simplified acute physiology score (SAPS) II - using five different performance measures: the area under the receiver operating characteristic curve (AUROC), Brier score, sensitivity, specificity and calibration plot.
The records of 16,688 sepsis patients in MIMIC III were used for model training and test. Amongst them, 2949 (17.7%) patients had in-hospital death. The average AUROCs of the LASSO, RF, GBM, LR and SAPS II models were 0.829, 0.829, 0.845, 0.833 and 0.77, respectively. The Brier scores of the LASSO, RF, GBM, LR and SAPS II models were 0.108, 0.109, 0.104, 0.107 and 0.146, respectively. The calibration plots showed that the GBM, LASSO and LR models had good calibration; the RF model underestimated high-risk patients; and SAPS II had the poorest calibration.
The machine learning-based models developed in this study had good prediction performance. Amongst them, the GBM model showed the best performance in predicting the risk of in-hospital death. It has the potential to assist physicians in the ICU to perform appropriate clinical interventions for critically ill sepsis patients and thus may help improve the prognoses of sepsis patients in the ICU.
早期准确识别院内死亡风险较高的脓毒症患者,有助于重症监护病房(ICU)的医生做出最佳临床决策。本研究旨在开发基于机器学习的工具,以预测 ICU 中脓毒症患者的住院死亡风险。
用于模型开发和验证的源数据库是医疗信息集市重症监护(MIMIC)III。我们使用新的脓毒症定义 Sepsis-3 来识别成年脓毒症患者。共使用了 86 个预测变量,包括人口统计学、实验室检查和合并症。我们采用最小绝对收缩和选择算子(LASSO)、随机森林(RF)、梯度提升机(GBM)和传统逻辑回归(LR)方法来开发预测模型。此外,还使用五种不同的性能指标评估和比较了这四种开发模型的预测性能,这五种性能指标包括:接收器操作特征曲线(ROC)下面积(AUROC)、Brier 评分、灵敏度、特异性和校准图。
MIMIC III 中的 16688 例脓毒症患者的记录用于模型训练和测试。其中,2949 例(17.7%)患者院内死亡。LASSO、RF、GBM、LR 和 SAPS II 模型的平均 AUROC 分别为 0.829、0.829、0.845、0.833 和 0.77。LASSO、RF、GBM、LR 和 SAPS II 模型的 Brier 评分分别为 0.108、0.109、0.104、0.107 和 0.146。校准图显示 GBM、LASSO 和 LR 模型具有良好的校准能力;RF 模型低估了高危患者;SAPS II 模型的校准效果最差。
本研究中开发的基于机器学习的模型具有良好的预测性能。其中,GBM 模型在预测院内死亡风险方面表现最佳。它有可能帮助 ICU 医生对危重症脓毒症患者进行适当的临床干预,从而可能改善 ICU 中脓毒症患者的预后。