Department of Business Analytics and Information Systems, Tippie College of Business, University of Iowa, Iowa City, IA, United States of America.
Civil and Environmental Engineering Department, Michigan State University, East Lansing, MI, United States of America.
PLoS One. 2022 May 5;17(5):e0262895. doi: 10.1371/journal.pone.0262895. eCollection 2022.
Improving the Intensive Care Unit (ICU) management network and building cost-effective and well-managed healthcare systems are high priorities for healthcare units. Creating accurate and explainable mortality prediction models helps identify the most critical risk factors in the patients' survival/death status and early detect the most in-need patients. This study proposes a highly accurate and efficient machine learning model for predicting ICU mortality status upon discharge using the information available during the first 24 hours of admission. The most important features in mortality prediction are identified, and the effects of changing each feature on the prediction are studied. We used supervised machine learning models and illness severity scoring systems to benchmark the mortality prediction. We also implemented a combination of SHAP, LIME, partial dependence, and individual conditional expectation plots to explain the predictions made by the best-performing model (CatBoost). We proposed E-CatBoost, an optimized and efficient patient mortality prediction model, which can accurately predict the patients' discharge status using only ten input features. We used eICU-CRD v2.0 to train and validate the models; the dataset contains information on over 200,000 ICU admissions. The patients were divided into twelve disease groups, and models were fitted and tuned for each group. The models' predictive performance was evaluated using the area under a receiver operating curve (AUROC). The AUROC scores were 0.86 [std:0.02] to 0.92 [std:0.02] for CatBoost and 0.83 [std:0.02] to 0.91 [std:0.03] for E-CatBoost models across the defined disease groups; if measured over the entire patient population, their AUROC scores were 7 to 18 and 2 to 12 percent higher than the baseline models, respectively. Based on SHAP explanations, we found age, heart rate, respiratory rate, blood urine nitrogen, and creatinine level as the most critical cross-disease features in mortality predictions.
改善重症监护病房 (ICU) 管理网络并建立具有成本效益和良好管理的医疗保健系统是医疗单位的重中之重。创建准确且可解释的死亡率预测模型有助于确定患者生存/死亡状态中的最关键风险因素,并及早发现最需要的患者。本研究提出了一种基于患者入院后 24 小时内可用信息预测 ICU 出院时死亡率的高度准确和高效的机器学习模型。确定了死亡率预测中最重要的特征,并研究了改变每个特征对预测的影响。我们使用有监督机器学习模型和疾病严重程度评分系统来对死亡率预测进行基准测试。我们还实现了 SHAP、LIME、部分依赖和个体条件期望图的组合,以解释表现最佳的模型 (CatBoost) 做出的预测。我们提出了 E-CatBoost,这是一种经过优化和高效的患者死亡率预测模型,仅使用十个输入特征即可准确预测患者的出院状态。我们使用 eICU-CRD v2.0 来训练和验证模型;该数据集包含 200,000 多名 ICU 入院患者的信息。将患者分为十二种疾病组,并为每组拟合和调整模型。使用接收器操作曲线下的面积 (AUROC) 评估模型的预测性能。CatBoost 和 E-CatBoost 模型在定义的疾病组中的 AUROC 评分分别为 0.86 [std:0.02] 至 0.92 [std:0.02] 和 0.83 [std:0.02] 至 0.91 [std:0.03];如果在整个患者群体中进行衡量,它们的 AUROC 评分分别比基线模型高 7%至 18%和 2%至 12%。根据 SHAP 解释,我们发现年龄、心率、呼吸率、血液尿液氮和肌酐水平是死亡率预测中最关键的跨疾病特征。