基于逻辑回归和随机森林模型,利用常规血液学和代谢参数预测菌血症。
Prediction of bacteremia using routine hematological and metabolic parameters based on logistic regression and random forest models.
作者信息
Wang Ting-Qiang, Zhuo Ying, Lv Chun-E, Shi Jing, Yao Ling-Hui, Zhang Shi-Yan, Shi Jinbao
机构信息
Department of Clinical Laboratory, Fuding Hospital, Fujian University of Traditional Chinese Medicine, Fuding, Fujian, China.
Department of Anesthesiology, Fuding Hospital, Fujian University of Traditional Chinese Medicine, Fuding, Fujian, China.
出版信息
Front Cell Infect Microbiol. 2025 Jul 28;15:1605485. doi: 10.3389/fcimb.2025.1605485. eCollection 2025.
BACKGROUND
This study aimed to evaluate the predictive utility of routine hematological, inflammatory, and metabolic markers for bacteremia and to compare the classification performance of logistic regression and random forest models.
METHODS
A retrospective study was conducted on 287 inpatients who underwent blood culture testing at Fuding Hospital, Fujian University of Traditional Chinese Medicine between March and August 2024. Patients were divided into bacteremia (n = 137) and non-bacteremia (n = 150) groups based on blood culture results. Hematological indices, inflammatory markers (e.g., C-reactive protein (CRP), procalcitonin (PCT)), metabolic indices (e.g., glucose, cholesterol) and nutritional markers (e.g., albumin) were analyzed. Univariate and multivariate binary logistic regression analyses were used to identify independent risk factors. Logistic regression and random forest models were developed using 33 features with a 70:30 train-test split and evaluated using the receiver operating characteristic (ROC) curves, confusion matrices and standard classification.
RESULTS
Hemoglobin, cholesterol, and albumin levels were significantly lower in the bacteremia group, while platelet count, CRP, PCT, glucose, and triglycerides were significantly elevated (all p < 0.05). Logistic regression identified platelet count (Odds ratios (OR) = 1.003, 95% confidence interval (CI): 1.001-1.006), PCT (OR = 1.032, 95% CI: 1.004-1.060), triglycerides (OR = 1.740, 95% CI: 1.052-2.879), and low cholesterol (OR = 0.523, 95% CI: 0.383-0.714) as independent risk factors. The area under the ROC curve (AUC) was 0.75 for the random forest model and 0.74 for logistic regression, with recall rates of 0.69 and 0.60, respectively.
CONCLUSION
Routine laboratory markers integrated into machine learning models demonstrated potential for early bacteremia prediction. Random forest exhibited superior sensitivity compared to logistic regression, suggesting its potential utility as a clinical screening tool.
背景
本研究旨在评估常规血液学、炎症和代谢标志物对菌血症的预测效用,并比较逻辑回归和随机森林模型的分类性能。
方法
对2024年3月至8月在福建中医药大学附属福鼎医院接受血培养检测的287例住院患者进行回顾性研究。根据血培养结果将患者分为菌血症组(n = 137)和非菌血症组(n = 150)。分析血液学指标、炎症标志物(如C反应蛋白(CRP)、降钙素原(PCT))、代谢指标(如葡萄糖、胆固醇)和营养标志物(如白蛋白)。采用单因素和多因素二元逻辑回归分析确定独立危险因素。使用33个特征以70:30的训练-测试分割比例构建逻辑回归和随机森林模型,并使用受试者工作特征(ROC)曲线、混淆矩阵和标准分类进行评估。
结果
菌血症组的血红蛋白、胆固醇和白蛋白水平显著降低,而血小板计数、CRP、PCT、葡萄糖和甘油三酯显著升高(所有p < 0.05)。逻辑回归确定血小板计数(比值比(OR)= 1.003,95%置信区间(CI):1.001 - 1.006)、PCT(OR = 1.032,95% CI:1.004 - 1.060)、甘油三酯(OR = 1.740,95% CI:1.052 - 2.879)和低胆固醇(OR = 0.523,95% CI:0.383 - 0.714)为独立危险因素。随机森林模型的ROC曲线下面积(AUC)为0.75,逻辑回归为0.74,召回率分别为0.69和0.60。
结论
纳入机器学习模型的常规实验室标志物显示出早期预测菌血症的潜力。与逻辑回归相比,随机森林表现出更高的敏感性,表明其作为临床筛查工具的潜在效用。