Wu Hongsheng, Liao Biling, Ji Tengfei, Ma Keqiang, Luo Yumei, Zhang Shengmin
Hepatobiliary Pancreatic Surgery Department, Huadu District People's Hospital of Guangzhou, Guangzhou, China.
Front Med (Lausanne). 2025 Jan 6;11:1496869. doi: 10.3389/fmed.2024.1496869. eCollection 2024.
Sepsis is a life-threatening disease associated with a high mortality rate, emphasizing the need for the exploration of novel models to predict the prognosis of this patient population. This study compared the performance of traditional logistic regression and machine learning models in predicting adult sepsis mortality.
To develop an optimum model for predicting the mortality of adult sepsis patients based on comparing traditional logistic regression and machine learning methodology.
Retrospective analysis was conducted on 606 adult sepsis inpatients at our medical center between January 2020 and December 2022, who were randomly divided into training and validation sets in a 7:3 ratio. Traditional logistic regression and machine learning methods were employed to assess the predictive ability of mortality in adult sepsis. Univariate analysis identified independent risk factors for the logistic regression model, while Least Absolute Shrinkage and Selection Operator (LASSO) regression facilitated variable shrinkage and selection for the machine learning model. Among various machine learning models, which included Bagged Tree, , , , , , and , the one with the maximum area under the curve (AUC) was chosen for model construction. Model validation and comparison with the Sequential Organ Failure Assessment (SOFA) and the Acute Physiology and Chronic Health Evaluation (APACHE) scores were performed using receiver operating characteristic (ROC) curves, calibration curves, and decision curve analysis (DCA) curves in the validation set.
Univariate analysis was employed to assess 17 variables, namely gender, history of coronary heart disease (CHD), systolic pressure, white blood cell (WBC), neutrophil count (NEUT), lymphocyte count (LYMP), lactic acid, neutrophil-to-lymphocyte ratio (NLR), red blood cell distribution width (RDW), interleukin-6 (IL-6), prothrombin time (PT), international normalized ratio (INR), fibrinogen (FBI), D-dimer, aspartate aminotransferase (AST), total bilirubin (Tbil), and lung infection. Significant differences ( < 0.05) between the survival and non-survival groups were observed for these variables. Utilizing stepwise regression with the "backward" method, independent risk factors, including systolic pressure, lactic acid, NLR, RDW, IL-6, PT, and Tbil, were identified. These factors were then incorporated into a logistic regression model, chosen based on the minimum Akaike Information Criterion (AIC) value (98.65). Machine learning techniques were also applied, and the RF model, demonstrating the maximum Area Under the Curve (AUC) of 0.999, was selected. LASSO regression, employing the lambda.1SE criteria, identified systolic pressure, lactic acid, NEUT, RDW, IL6, INR, and Tbil as variables for constructing the RF model, validated through ten-fold cross-validation. For model validation and comparison with traditional logistic models, SOFA, and APACHE scoring.
Based on deep machine learning principles, the RF model demonstrates advantages over traditional logistic regression models in predicting adult sepsis prognosis. The RF model holds significant potential for clinical surveillance and interventions to enhance outcomes for sepsis patients.
脓毒症是一种危及生命的疾病,死亡率很高,这凸显了探索新模型以预测该患者群体预后的必要性。本研究比较了传统逻辑回归和机器学习模型在预测成人脓毒症死亡率方面的表现。
通过比较传统逻辑回归和机器学习方法,开发一种预测成人脓毒症患者死亡率的最佳模型。
对2020年1月至2022年12月期间在我们医疗中心住院的606例成人脓毒症患者进行回顾性分析,这些患者以7:3的比例随机分为训练集和验证集。采用传统逻辑回归和机器学习方法评估成人脓毒症死亡率的预测能力。单因素分析确定逻辑回归模型的独立危险因素,而最小绝对收缩和选择算子(LASSO)回归有助于机器学习模型的变量收缩和选择。在包括袋装树等多种机器学习模型中,选择曲线下面积(AUC)最大的模型进行模型构建。在验证集中使用受试者操作特征(ROC)曲线、校准曲线和决策曲线分析(DCA)曲线对模型进行验证,并与序贯器官衰竭评估(SOFA)和急性生理与慢性健康评估(APACHE)评分进行比较。
采用单因素分析评估17个变量,即性别、冠心病(CHD)病史、收缩压、白细胞(WBC)、中性粒细胞计数(NEUT)、淋巴细胞计数(LYMP)、乳酸、中性粒细胞与淋巴细胞比值(NLR)、红细胞分布宽度(RDW)、白细胞介素-6(IL-6)、凝血酶原时间(PT)、国际标准化比值(INR)、纤维蛋白原(FBI)、D-二聚体、天冬氨酸转氨酶(AST)、总胆红素(Tbil)和肺部感染。这些变量在生存组和非生存组之间存在显著差异(<0.05)。采用“向后”法逐步回归,确定了包括收缩压、乳酸、NLR、RDW、IL-6、PT和Tbil在内的独立危险因素。然后将这些因素纳入基于最小赤池信息准则(AIC)值(98.65)选择的逻辑回归模型。还应用了机器学习技术,选择了曲线下面积(AUC)最大为0.999的随机森林(RF)模型。LASSO回归采用lambda.1SE标准,确定收缩压、乳酸、NEUT、RDW、IL6、INR和Tbil为构建RF模型的变量,并通过十折交叉验证进行验证。用于模型验证并与传统逻辑模型、SOFA和APACHE评分进行比较。
基于深度机器学习原理,随机森林(RF)模型在预测成人脓毒症预后方面优于传统逻辑回归模型。随机森林(RF)模型在临床监测和干预以改善脓毒症患者预后方面具有巨大潜力。