Suppr超能文献

基于监督机器学习算法构建脓毒症休克患者死亡风险预测模型

[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].

作者信息

Xie Zheng, Jin Jing, Liu Dongsong, Lu Shengyi, Yu Hui, Han Dong, Sun Wei, Huang Ming

机构信息

Department of Emergency, Affiliated Hospital of Jiangnan University, Wuxi 214000, Jiangsu, China.

Department of Neurology, Affiliated Hospital of Jiangnan University, Wuxi 214000, Jiangsu, China. Corresponding author: Huang Ming, Email:

出版信息

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2024 Apr;36(4):345-352. doi: 10.3760/cma.j.cn121430-20230930-00832.

Abstract

OBJECTIVE

To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms.

METHODS

The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA).

RESULTS

A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively.

CONCLUSIONS

The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca, Hb, WBC, SAPS III score, APS III score, Na, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.

摘要

目的

基于不同的监督机器学习算法构建并验证脓毒性休克患者28天死亡风险的最佳预测模型。

方法

从重症监护医学信息数据库-IV v2.0(MIMIC-IV v2.0)中选取符合Sepsis-3标准的脓毒性休克患者。按照随机分配原则,将这些患者的70%作为训练集,30%作为验证集。从三个方面提取相关预测变量:人口统计学特征和基本生命体征、重症监护病房(ICU)入院24小时内的血清指标以及可能影响指标的并发症、功能评分和高级生命支持。比较使用决策树分类与回归树(CART)、随机森林(RF)、支持向量机(SVM)、线性回归(LR)和超级学习器[SL;结合CART、RF和极端梯度提升(XGBoost)]这五种主流机器学习算法构建的模型对脓毒性休克患者28天死亡的预测效能,选择最佳算法模型。通过LASSO回归、RF和XGBoost算法的结果交叉确定最优预测变量,并构建预测模型。通过绘制受试者工作特征曲线(ROC曲线)验证模型的预测效能,使用校准曲线评估模型的准确性,并通过决策曲线分析(DCA)验证模型的实用性。

结果

共纳入3295例脓毒性休克患者,其中2164例存活,1131例在28天内死亡,死亡率为34.32%。其中,2307例在训练集(28天内792例死亡,死亡率为34.33%),988例在验证集(28天内339例死亡,死亡率为34.31%)。基于训练集数据建立了五个机器学习模型。纳入三个方面的变量后,验证集中RF、SVM和LR机器学习算法模型预测脓毒性休克患者28天死亡的ROC曲线下面积(AUC)分别为0.823[95%置信区间(95%CI)为0.795 - 0.849]、0.823(95%CI为0.796 - 0.849)和0.810(95%CI为0.782 - 0.838),高于CART算法模型(AUC = 0.750,95%CI为0.717 - 0.782)和SL算法模型(AUC = 0.756,95%CI为0.724 - 0.789)。因此,确定上述三种算法模型为最佳算法模型。整合三个方面的变量后,通过LASSO回归、RF和XGBoost算法交叉确定了16个最优预测变量,包括最高pH值、最高白蛋白(Alb)、最高体温、最低乳酸(Lac)、最高Lac、最高血清肌酐(SCr)、最高钙(Ca)、最低血红蛋白(Hb)、最低白细胞计数(WBC)、年龄、简化急性生理学评分III(SAPS III)、最高WBC、急性生理学评分III(APS III)、最低钠(Na)、体重指数(BMI)以及ICU入院24小时内最短活化部分凝血活酶时间(APTT)。ROC曲线分析表明,用上述16个最优预测变量构建的Logistic回归模型是最佳预测模型,在验证集中AUC为0.806(95%CI为0.778 - 0.835)。校准曲线和DCA曲线表明该模型具有较高的准确性,最高净效益可达0.3,显著优于基于单一功能评分[APS III评分、SAPS III评分和序贯器官衰竭评估(SOFA)评分]的传统模型,其AUC(95%CI)分别为0.746(0.715 - 0.778)、0.765(0.734 - 0.796)和0.625(0.589 - 0.661)。

结论

使用包括pH值、Alb、体温、Lac、SCr、Ca、Hb、WBC、SAPS III评分、APS III评分、Na、BMI和APTT在内的16个最优预测变量构建的Logistic回归模型被确定为脓毒性休克患者28天死亡风险的最佳预测模型。其性能稳定,具有较高的判别能力和准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验