Abdollahi Armin, Ashrafi Negin, Ma Xinghong, Zhang Jiahao, Wu Daijia, Wu Tongshou, Ye Zizheng, Pishgar Maryam
Andrew and Erna Viterbi School of Engineering, University of Southern California (USC), Los Angeles, CA, United States.
PLoS One. 2025 May 28;20(5):e0323441. doi: 10.1371/journal.pone.0323441. eCollection 2025.
Background Stroke is second-leading cause of disability and death among adults. Approximately 17 million people suffer from a stroke annually, with about 85% being ischemic strokes. Predicting mortality of ischemic stroke patients in intensive care unit (ICU) is crucial for optimizing treatment strategies, allocating resources, and improving survival rates. Methods We acquired data on ICU ischemic stroke patients from MIMIC-IV database, including diagnoses, vital signs, laboratory tests, medications, procedures, treatments, and clinical notes. Stroke patients were randomly divided into training (70%, n=2441), test (15%, n=523), and validation (15%, n=523) sets. To address data imbalances, we applied Synthetic Minority Over-sampling Technique (SMOTE). We selected 30 features for model development, significantly reducing feature number from 1095 used in the best study. We developed a deep learning model to assess mortality risk and implemented several baseline machine learning models for comparison. Results XGB-DL model, combining XGBoost for feature selection and deep learning, effectively minimized false positives. Model's AUROC improved from 0.865 (95% CI: 0.821 - 0.905) on first day to 0.903 (95% CI: 0.868 - 0.936) by fourth day using data from 3,646 ICU mortality patients in the MIMIC-IV database with 0.945 AUROC (95% CI: 0.944-0.947) during training. Although other ML models also performed well in terms of AUROC, we chose Deep Learning for its higher specificity. Conclusion Through enhanced feature selection and data cleaning, proposed model demonstrates a 13% AUROC improvement compared to existing models while reducing feature number from 1095 in previous studies to 30.
中风是成年人致残和死亡的第二大原因。每年约有1700万人中风,其中约85%为缺血性中风。预测重症监护病房(ICU)缺血性中风患者的死亡率对于优化治疗策略、分配资源和提高生存率至关重要。方法:我们从MIMIC-IV数据库中获取了ICU缺血性中风患者的数据,包括诊断、生命体征、实验室检查、药物治疗、操作、治疗和临床记录。中风患者被随机分为训练集(70%,n = 2441)、测试集(15%,n = 523)和验证集(15%,n = 523)。为了解决数据不平衡问题,我们应用了合成少数过采样技术(SMOTE)。我们选择了30个特征用于模型开发,显著减少了最佳研究中使用的1095个特征数量。我们开发了一个深度学习模型来评估死亡风险,并实施了几个基线机器学习模型进行比较。结果:XGB-DL模型结合了用于特征选择的XGBoost和深度学习,有效减少了误报。使用MIMIC-IV数据库中3646例ICU死亡患者的数据,模型的曲线下面积(AUROC)从第一天的0.865(95%置信区间:0.821 - 0.905)提高到第四天的0.903(95%置信区间:0.868 - 0.936),训练期间AUROC为0.945(95%置信区间:0.944 - 0.947)。尽管其他机器学习模型在AUROC方面也表现良好,但我们选择深度学习是因为其更高的特异性。结论:通过增强特征选择和数据清理,所提出的模型与现有模型相比,AUROC提高了13%,同时将特征数量从先前研究中的1095个减少到30个。