Marandi Ramtin Zargari, Hertz Frederik Boetius, Thomassen Jesper Qvist, Rasmussen Steen Christian, Frikke-Schmidt Ruth, Frimodt-Møller Niels, Nielsen Karen Leth, MacPherson Cameron Ross
Centre of Excellence for Health, Immunity and Infections (CHIP), Rigshospitalet, Copenhagen University Hospital, Copenhagen , Denmark.
Department of Clinical Microbiology 9301, Copenhagen University Hospital, Rigshospitalet, Henrik Harpestrengs vej 4A, 2100, Copenhagen, Denmark.
Sci Rep. 2025 May 20;15(1):17478. doi: 10.1038/s41598-025-01821-6.
Early diagnosis of bloodstream infection (BSI) is crucial for informed antibiotic use. This study developed a machine learning approach for early BSI detection using a comprehensive dataset from Rigshospitalet, Denmark (2010-2020). The dataset included 144,398 samples from adult patients, containing blood culture results, demographics, and up to 36 biochemical variables. Positive blood culture was observed in 6.4% of samples, mostly caused by Staphylococcus aureus, Escherichia coli, and Enterococcus faecium. 80% of the samples (N = 43,351 patients) were used for ML model development and five-fold cross-validation, with 20% for independent testing (N = 10,837). Among seven models, LightGBM performed best, achieving an AUC of 0.69 on the test set. It was more accurate in detecting negatives, with a negative predictive value (NPV) of 0.96 and specificity of 0.74, compared to a positive predictive value (PPV) of 0.13 and sensitivity of 0.54. SHapley Additive exPlanations (SHAP) identified platelets, leukocytes, and neutrophils-to-lymphocytes as the top-3 predictive features. The model showed higher sensitivity (average 0.66) for common pathogens, e.g., 0.71 for E. coli. Results highlight the potential of biochemical variables as diagnostic factors for BSI, indicating clinical use to focus on identifying patients at low risks and can be further enhanced in future investigations.
血流感染(BSI)的早期诊断对于合理使用抗生素至关重要。本研究利用丹麦里格霍斯医院(2010 - 2020年)的综合数据集,开发了一种用于早期BSI检测的机器学习方法。该数据集包括来自成年患者的144,398个样本,包含血培养结果、人口统计学信息以及多达36个生化变量。6.4%的样本血培养呈阳性,主要由金黄色金黄色血金黄色葡萄球菌、大肠杆菌和粪肠球菌引起。80%的样本(N = 43,351例患者)用于机器学习模型开发和五折交叉验证,20%用于独立测试(N = 10,837)。在七个模型中,LightGBM表现最佳,在测试集上的AUC为0.69。与阳性预测值(PPV)为0.13和灵敏度为0.54相比,其在检测阴性结果时更准确,阴性预测值(NPV)为0.96,特异性为0.74。SHapley加法解释(SHAP)确定血小板、白细胞和中性粒细胞与淋巴细胞比值为前三大预测特征。该模型对常见病原体表现出更高的灵敏度(平均0.66),例如对大肠杆菌的灵敏度为0.71。结果突出了生化变量作为BSI诊断因素的潜力,表明临床应用应侧重于识别低风险患者,并且在未来研究中可以进一步改进。