Xie Jingyuan, Gao Jiandong, Yang Mutian, Zhang Ting, Liu Yecheng, Chen Yutong, Liu Zetong, Mei Qimin, Li Zhimao, Zhu Huadong, Wu Ji
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.
Center for Big Data and Clinical Research, Institute for Precision Medicine, Tsinghua University, Beijing 100084, China.
World J Emerg Med. 2024;15(5):379-385. doi: 10.5847/wjem.j.1920-8642.2024.074.
Sepsis is one of the main causes of mortality in intensive care units (ICUs). Early prediction is critical for reducing injury. As approximately 36% of sepsis occur within 24 h after emergency department (ED) admission in Medical Information Mart for Intensive Care (MIMIC-IV), a prediction system for the ED triage stage would be helpful. Previous methods such as the quick Sequential Organ Failure Assessment (qSOFA) are more suitable for screening than for prediction in the ED, and we aimed to find a light-weight, convenient prediction method through machine learning.
We accessed the MIMIC-IV for sepsis patient data in the EDs. Our dataset comprised demographic information, vital signs, and synthetic features. Extreme Gradient Boosting (XGBoost) was used to predict the risk of developing sepsis within 24 h after ED admission. Additionally, SHapley Additive exPlanations (SHAP) was employed to provide a comprehensive interpretation of the model's results. Ten percent of the patients were randomly selected as the testing set, while the remaining patients were used for training with 10-fold cross-validation.
For 10-fold cross-validation on 14,957 samples, we reached an accuracy of 84.1%±0.3% and an area under the receiver operating characteristic (ROC) curve of 0.92±0.02. The model achieved similar performance on the testing set of 1,662 patients. SHAP values showed that the five most important features were acuity, arrival transportation, age, shock index, and respiratory rate.
Machine learning models such as XGBoost may be used for sepsis prediction using only a small amount of data conveniently collected in the ED triage stage. This may help reduce workload in the ED and warn medical workers against the risk of sepsis in advance.
脓毒症是重症监护病房(ICU)死亡的主要原因之一。早期预测对于减少伤害至关重要。由于在重症监护医学信息集市(MIMIC-IV)中,约36%的脓毒症发生在急诊科(ED)入院后24小时内,因此针对ED分诊阶段的预测系统将很有帮助。先前的方法,如快速序贯器官衰竭评估(qSOFA),更适合在ED中进行筛查而非预测,我们旨在通过机器学习找到一种轻量级、便捷的预测方法。
我们访问了MIMIC-IV以获取ED中脓毒症患者的数据。我们的数据集包括人口统计学信息、生命体征和综合特征。使用极端梯度提升(XGBoost)来预测ED入院后24小时内发生脓毒症的风险。此外,采用SHapley值加法解释(SHAP)对模型结果进行全面解释。随机选择10%的患者作为测试集,其余患者用于10折交叉验证训练。
对于14957个样本的10折交叉验证,我们达到了84.1%±0.3%的准确率和0.92±0.02的受试者操作特征(ROC)曲线下面积。该模型在1662例患者的测试集上表现相似。SHAP值表明,五个最重要的特征是病情严重程度、到达时的交通方式、年龄、休克指数和呼吸频率。
XGBoost等机器学习模型可仅使用在ED分诊阶段方便收集的少量数据用于脓毒症预测。这可能有助于减轻ED的工作量,并提前警告医务人员脓毒症风险。