Park Sang Won, Yeo Na Young, Kang Seonguk, Ha Taejun, Kim Tae-Hoon, Lee DooHee, Kim Dowon, Choi Seheon, Kim Minkyu, Lee DongHoon, Kim DoHyeon, Kim Woo Jin, Lee Seung-Joon, Heo Yeon-Jeong, Moon Da Hye, Han Seon-Sook, Kim Yoon, Choi Hyun-Soo, Oh Dong Kyu, Lee Su Yeon, Park MiHyeon, Lim Chae-Man, Heo Jeongwon
Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon, Korea.
Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon, Korea.
J Korean Med Sci. 2024 Feb 5;39(5):e53. doi: 10.3346/jkms.2024.39.e53.
Worldwide, sepsis is the leading cause of death in hospitals. If mortality rates in patients with sepsis can be predicted early, medical resources can be allocated efficiently. We constructed machine learning (ML) models to predict the mortality of patients with sepsis in a hospital emergency department.
This study prospectively collected nationwide data from an ongoing multicenter cohort of patients with sepsis identified in the emergency department. Patients were enrolled from 19 hospitals between September 2019 and December 2020. For acquired data from 3,657 survivors and 1,455 deaths, six ML models (logistic regression, support vector machine, random forest, extreme gradient boosting [XGBoost], light gradient boosting machine, and categorical boosting [CatBoost]) were constructed using fivefold cross-validation to predict mortality. Through these models, 44 clinical variables measured on the day of admission were compared with six sequential organ failure assessment (SOFA) components (PaO/FIO [PF], platelets (PLT), bilirubin, cardiovascular, Glasgow Coma Scale score, and creatinine). The confidence interval (CI) was obtained by performing 10,000 repeated measurements via random sampling of the test dataset. All results were explained and interpreted using Shapley's additive explanations (SHAP).
Of the 5,112 participants, CatBoost exhibited the highest area under the curve (AUC) of 0.800 (95% CI, 0.756-0.840) using clinical variables. Using the SOFA components for the same patient, XGBoost exhibited the highest AUC of 0.678 (95% CI, 0.626-0.730). As interpreted by SHAP, albumin, lactate, blood urea nitrogen, and international normalization ratio were determined to significantly affect the results. Additionally, PF and PLTs in the SOFA component significantly influenced the prediction results.
Newly established ML-based models achieved good prediction of mortality in patients with sepsis. Using several clinical variables acquired at the baseline can provide more accurate results for early predictions than using SOFA components. Additionally, the impact of each variable was identified.
在全球范围内,脓毒症是医院死亡的主要原因。如果能早期预测脓毒症患者的死亡率,就能有效分配医疗资源。我们构建了机器学习(ML)模型来预测医院急诊科脓毒症患者的死亡率。
本研究前瞻性地收集了来自一个正在进行的全国多中心急诊科脓毒症患者队列的数据。患者于2019年9月至2020年12月期间从19家医院招募。对于3657名幸存者和1455例死亡患者的获取数据,使用五折交叉验证构建了六个ML模型(逻辑回归、支持向量机、随机森林、极端梯度提升[XGBoost]、轻梯度提升机和分类提升[CatBoost])来预测死亡率。通过这些模型,将入院当天测量的44个临床变量与六个序贯器官衰竭评估(SOFA)组件(氧合指数[PF]、血小板[PLT]、胆红素、心血管、格拉斯哥昏迷量表评分和肌酐)进行比较。通过对测试数据集进行随机抽样进行10000次重复测量来获得置信区间(CI)。所有结果均使用夏普利加法解释(SHAP)进行解释和说明。
在5112名参与者中,使用临床变量时,CatBoost的曲线下面积(AUC)最高,为0.800(95%CI,0.756 - 0.840)。对于同一患者使用SOFA组件时,XGBoost的AUC最高,为0.678(95%CI,0.626 - 0.730)。如SHAP所解释,白蛋白、乳酸、血尿素氮和国际标准化比值被确定为对结果有显著影响。此外,SOFA组件中的PF和PLT对预测结果有显著影响。
新建立的基于ML的模型对脓毒症患者的死亡率实现了良好的预测。与使用SOFA组件相比,使用基线时获取的几个临床变量进行早期预测可提供更准确的结果。此外,还确定了每个变量的影响。