基于临床数据，运用机器学习对重症监护病房中脓毒症患者的急性呼吸窘迫综合征（ARDS）进行早期预测。

Machine learning for the early prediction of acute respiratory distress syndrome (ARDS) in patients with sepsis in the ICU based on clinical data.

作者信息

Jiang Zhenzhen, Liu Leping, Du Lin, Lv Shanshan, Liang Fang, Luo Yanwei, Wang Chunjiang, Shen Qin

机构信息

Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China.

Department of Pediatrics, The Third Xiangya Hospital, Central South University, Changsha, China.

出版信息

Heliyon. 2024 Mar 13;10(6):e28143. doi: 10.1016/j.heliyon.2024.e28143. eCollection 2024 Mar 30.

DOI:10.1016/j.heliyon.2024.e28143

PMID:38533071

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10963609/

Abstract

BACKGROUND

Acute respiratory distress syndrome (ARDS) is a fatal outcome of severe sepsis. Machine learning models are helpful for accurately predicting ARDS in patients with sepsis at an early stage.

OBJECTIVE

We aim to develop a machine-learning model for predicting ARDS in patients with sepsis in the intensive care unit (ICU).

METHODS

The initial clinical data of patients with sepsis admitted to the hospital (including population characteristics, clinical diagnosis, complications, and laboratory tests) were used to predict ARDS, and screen out the crucial variables. After comparing eight different algorithms, namely, XG boost, logistic regression, light GBM, random forest, GaussianNB, complement NB, support vector machine (SVM), and K nearest neighbors (KNN), rebuilding a prediction model with the best one. When remodeling with the best algorithm, 10% was randomly selected to test, and the remaining was trained for cross-validation. Using the area under the curve (AUC), sensitivity, accuracy, specificity, positive and negative predictive value, F1 score, kappa value, and clinical decision curve to evaluate the model's performance. Eventually, the application in the model illustrated by the SHAP package.

RESULTS

Ten critical features were screened utilizing the lasso method, namely, PaO/PAO, A-aDO, PO(T), CRP, gender, PO, RDW, MCH, SG, and chlorine. The prior ranking of variables demonstrated that PaO/PAO was the most significant variable. Among the eight algorithms, the performance of the Gaussian NB algorithm was significantly better than that of the others. After remodeling with the best algorithm, the AUC in the training and validation sets were 0.777 and 0.770, respectively, and the algorithm performed well in the test set (AUC = 0.781, accuracy = 78.6%, sensitivity = 82.4%, F1 score = 0.824). A comparison of the overlap factors with those of previous models revealed that the model we developed performs better.

CONCLUSION

Sepsis-associated ARDS can be accurately predicted early via a machine learning model based on existing clinical data. These findings are helpful for accurate identification and improvement of the prognosis in patients with sepsis-associated ARDS.

摘要

背景

急性呼吸窘迫综合征（ARDS）是严重脓毒症的致命结局。机器学习模型有助于在早期准确预测脓毒症患者发生ARDS的情况。

目的

我们旨在开发一种用于预测重症监护病房（ICU）中脓毒症患者发生ARDS的机器学习模型。

方法

利用入住医院的脓毒症患者的初始临床数据（包括人口统计学特征、临床诊断、并发症和实验室检查）来预测ARDS，并筛选出关键变量。在比较八种不同算法，即XGBoost、逻辑回归、轻量级梯度提升机（Light GBM）、随机森林、高斯朴素贝叶斯（GaussianNB）、互补朴素贝叶斯（complement NB）、支持向量机（SVM）和K近邻（KNN）之后，用最佳算法重建预测模型。在用最佳算法进行重塑时，随机选择10%进行测试，其余数据用于交叉验证。使用曲线下面积（AUC）、灵敏度、准确率、特异性、阳性和阴性预测值、F1分数、kappa值和临床决策曲线来评估模型性能。最终，通过SHAP包展示模型中的应用情况。

结果

利用套索法筛选出10个关键特征，即动脉血氧分压/肺泡气氧分压（PaO/PAO）、肺泡-动脉血氧分压差（A-aDO）、氧输送指数（PO(T)）、C反应蛋白（CRP）、性别、氧分压（PO）、红细胞分布宽度（RDW）、平均红细胞血红蛋白含量（MCH）、血清球蛋白（SG）和氯。变量的优先排序表明，PaO/PAO是最显著的变量。在这八种算法中，高斯朴素贝叶斯算法的性能明显优于其他算法。在用最佳算法进行重塑后，训练集和验证集的AUC分别为0.777和0.770，且该算法在测试集中表现良好（AUC = 0.781，准确率 = 78.6%，灵敏度 = 82.4%，F1分数 = 0.824）。将重叠因子与先前模型的重叠因子进行比较后发现，我们开发的模型表现更好。