Ashrafi Negin, Abdollahi Armin, Alaei Kamiar, Pishgar Maryam
Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA, USA.
Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA.
Sci Rep. 2025 Apr 2;15(1):11363. doi: 10.1038/s41598-025-95779-0.
Ventilator-associated pneumonia significantly increases morbidity, mortality, and healthcare costs among patients with traumatic brain injury. Accurately predicting risk can facilitate earlier interventions and improve patient outcomes. This study leveraged the MIMIC III database, identifying traumatic brain injury cases through standardized clinical criteria. A rigorous data preprocessing workflow included missing value imputation, correlation checks, and expert-driven feature selection, reducing an initial set of features to a subset of critical predictors encompassing demographics, comorbidities, laboratory values, and clinical interventions. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied within a five-fold cross-validation framework, ensuring a balanced training set while maintaining an unbiased validation process. Six machine learning models, including Support Vector Machine, Logistic Regression, Random Forest, XGBoost, Artificial Neural Network, and AdaBoost, were trained using extensive hyperparameter tuning. Comprehensive evaluations were conducted based on multiple metrics, including Area Under the Curve (AUC), accuracy, F1 score, sensitivity, specificity, Positive Predictive Value, and Negative Predictive Value. XGBoost emerged as the top performing algorithm, achieving an AUC of 0.94 and an accuracy of 0.875 on the test set, marking substantial improvements over previously reported best results. An ablation study validated the necessity of each retained feature, indicating that any feature removal led to a decline in model performance. Furthermore, SHAP analysis underscored ICU length of stay, hospital length of stay, serum potassium, and blood urea nitrogen as key contributors to ventilator associated pneumonia risk. Overall, the results demonstrate that advanced ensemble learning, meticulous feature selection, and effective class imbalance handling can significantly enhance early detection in traumatic brain injury cases. These findings have meaningful clinical implications, offering a framework for more timely interventions, optimized resource allocation, and improved patient care in critical settings.
呼吸机相关性肺炎显著增加了创伤性脑损伤患者的发病率、死亡率和医疗成本。准确预测风险有助于早期干预并改善患者预后。本研究利用MIMIC III数据库,通过标准化临床标准识别创伤性脑损伤病例。严格的数据预处理工作流程包括缺失值插补、相关性检查和专家驱动的特征选择,将初始特征集减少到一个关键预测因子子集,涵盖人口统计学、合并症、实验室值和临床干预措施。为了解决类别不平衡问题,在五折交叉验证框架内应用了合成少数过采样技术(SMOTE),确保训练集平衡,同时保持无偏验证过程。使用广泛的超参数调整训练了六种机器学习模型,包括支持向量机、逻辑回归、随机森林、XGBoost、人工神经网络和AdaBoost。基于多个指标进行了综合评估,包括曲线下面积(AUC)、准确率、F1分数、敏感性、特异性、阳性预测值和阴性预测值。XGBoost成为表现最佳的算法,在测试集上的AUC为0.94,准确率为0.875,比之前报道的最佳结果有显著提高。一项消融研究验证了每个保留特征的必要性,表明任何特征的去除都会导致模型性能下降。此外,SHAP分析强调重症监护病房住院时间、住院时间、血清钾和血尿素氮是呼吸机相关性肺炎风险的关键因素。总体而言,结果表明先进集成学习法、细致的特征选择和有效的类别不平衡处理可显著提高创伤性脑损伤病例的早期检测。这些发现具有重要的临床意义,为在关键情况下更及时的干预、优化资源分配和改善患者护理提供了一个框架。