Department of Industrial and Systems Engineering, University of Southern California (USC), Los Angeles, CA, United States of America.
PLoS One. 2024 Sep 4;19(9):e0309383. doi: 10.1371/journal.pone.0309383. eCollection 2024.
Mechanical ventilation (MV) is vital for critically ill ICU patients but carries significant mortality risks. This study aims to develop a predictive model to estimate hospital mortality among MV patients, utilizing comprehensive health data to assist ICU physicians with early-stage alerts.
We developed a Machine Learning (ML) framework to predict hospital mortality in ICU patients receiving MV. Using the MIMIC-III database, we identified 25,202 eligible patients through ICD-9 codes. We employed backward elimination and the Lasso method, selecting 32 features based on clinical insights and literature. Data preprocessing included eliminating columns with over 90% missing data and using mean imputation for the remaining missing values. To address class imbalance, we used the Synthetic Minority Over-sampling Technique (SMOTE). We evaluated several ML models, including CatBoost, XGBoost, Decision Tree, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Logistic Regression, using a 70/30 train-test split. The CatBoost model was chosen for its superior performance in terms of accuracy, precision, recall, F1-score, AUROC metrics, and calibration plots.
The study involved a cohort of 25,202 patients on MV. The CatBoost model attained an AUROC of 0.862, an increase from an initial AUROC of 0.821, which was the best reported in the literature. It also demonstrated an accuracy of 0.789, an F1-score of 0.747, and better calibration, outperforming other models. These improvements are due to systematic feature selection and the robust gradient boosting architecture of CatBoost.
The preprocessing methodology significantly reduced the number of relevant features, simplifying computational processes, and identified critical features previously overlooked. Integrating these features and tuning the parameters, our model demonstrated strong generalization to unseen data. This highlights the potential of ML as a crucial tool in ICUs, enhancing resource allocation and providing more personalized interventions for MV patients.
机械通气(MV)对重症监护病房(ICU)的危重病患者至关重要,但也带来了显著的死亡率风险。本研究旨在开发一种预测模型,利用综合健康数据为 ICU 医生提供早期警报,以估计 MV 患者的住院死亡率。
我们开发了一种机器学习(ML)框架,以预测接受 MV 的 ICU 患者的住院死亡率。使用 MIMIC-III 数据库,我们通过 ICD-9 代码确定了 25202 名符合条件的患者。我们采用向后消除和 Lasso 方法,根据临床见解和文献选择了 32 个特征。数据预处理包括消除 90%以上缺失数据的列,并使用均值插补处理其余缺失值。为了解决类别不平衡问题,我们使用了合成少数过采样技术(SMOTE)。我们评估了几种 ML 模型,包括 CatBoost、XGBoost、决策树、随机森林、支持向量机(SVM)、K-最近邻(KNN)和逻辑回归,使用 70/30 的训练-测试分割。CatBoost 模型因其在准确性、精度、召回率、F1 分数、AUROC 指标和校准图方面的卓越表现而被选中。
该研究涉及 25202 名接受 MV 的患者队列。CatBoost 模型的 AUROC 为 0.862,高于文献中报告的初始 AUROC 0.821。它还表现出 0.789 的准确性、0.747 的 F1 分数和更好的校准,优于其他模型。这些改进是由于系统的特征选择和 CatBoost 的稳健梯度提升架构。
预处理方法大大减少了相关特征的数量,简化了计算过程,并确定了以前被忽视的关键特征。整合这些特征并调整参数,我们的模型对未见数据表现出强大的泛化能力。这突出了 ML 作为 ICU 中关键工具的潜力,增强了资源分配,并为 MV 患者提供了更个性化的干预措施。