Wang Neng, Tao Shuai, Chen Liang
Department of Liver Disease, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China.
Research Unit, Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China.
BMC Infect Dis. 2025 Jul 1;25(1):847. doi: 10.1186/s12879-025-11199-5.
To develop and validate a novel diagnostic model for detecting bacterial infections in patients with hepatitis B virus-related acute-on-chronic liver failure (HBV-ACLF) using advanced machine learning algorithms. The focus is on improving early clinical identification and interpretability.
We conducted a retrospective cohort study involving HBV-ACLF patients diagnosed at the Shanghai Public Health Clinical Center between January 2014 and January 2024. Patients were categorized into two groups: those with bacterial infections and those without, based on clinical assessments and microbiological evidence. Feature selection was performed in two steps: first using univariate logistic regression, followed by multivariate logistic regression with a stringent significance threshold (p < 0.05). We utilized six machine learning algorithms-Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT)-to construct predictive models. The performance of each model was rigorously evaluated using metrics such as the area under the receiver operating characteristic curve (AUC), accuracy, precision, recall, and F1 score, with hyperparameter optimization conducted via grid search. The Shapley Additive Explanations (SHAP) algorithm was employed to analyze the contribution of each feature in the optimal model.
A total of 1,124 patients with HBV-ACLF were included, and the incidence of bacterial infection was 58.48%. Among them, 786 patients were assigned to the training set, and 338 patients were assigned to the test set. The XGBoost model showed the best overall prediction performance. In the modeling cohort, the XGBoost model had an AUC of 0.940, an accuracy of 0.858, a precision of 0.881, a recall of 0.874, and an F1 score of 0.877. In the validation cohort, the XGBoost model had an AUC of 0.930, an accuracy of 0.840, a sensitivity of 0.887, a recall of 0.833, and an F1 score of 0.859. SHAP analysis revealed a unique characteristic driving the risk of bacterial infection, including apolipoprotein A1, complement C3, D-dimer, C-reactive protein (CRP), total bilirubin (TBIL), and international normalized ratio (INR), which reflected the synergistic effect of markers of inflammation, coagulation, and liver dysfunction. Subgroup analysis of XGBoost found that XGBoost also had good diagnostic performance in patients with HBV-ACLF complicated by ascites and those with HBV-ACLF complicated by SIRS.
A diagnostic model for bacterial infection in HBV-ACLF was constructed based on the XGBoost method combined with six testing indicators, which facilitates early clinical diagnosis of bacterial infection in HBV-ACLF.
利用先进的机器学习算法开发并验证一种用于检测乙型肝炎病毒相关慢加急性肝衰竭(HBV-ACLF)患者细菌感染的新型诊断模型。重点在于提高早期临床识别能力和可解释性。
我们进行了一项回顾性队列研究,纳入了2014年1月至2024年1月在上海公共卫生临床中心诊断为HBV-ACLF的患者。根据临床评估和微生物学证据,将患者分为两组:有细菌感染组和无细菌感染组。特征选择分两步进行:首先使用单变量逻辑回归,然后使用具有严格显著性阈值(p < 0.05)的多变量逻辑回归。我们利用六种机器学习算法——极端梯度提升(XGBoost)、支持向量机(SVM)、逻辑回归(LR)、K近邻(KNN)、随机森林(RF)和决策树(DT)——构建预测模型。使用受试者工作特征曲线下面积(AUC)、准确率、精确率、召回率和F1分数等指标对每个模型的性能进行严格评估,并通过网格搜索进行超参数优化。采用Shapley值加法解释(SHAP)算法分析最佳模型中每个特征的贡献。
共纳入1124例HBV-ACLF患者,细菌感染发生率为58.48%。其中,786例患者被分配到训练集,338例患者被分配到测试集。XGBoost模型显示出最佳的总体预测性能。在建模队列中,XGBoost模型的AUC为0.940,准确率为0.858,精确率为0.881,召回率为0.874,F1分数为0.877。在验证队列中,XGBoost模型的AUC为0.930,准确率为0.840,敏感度为0.887,召回率为0.833,F1分数为0.859。SHAP分析揭示了驱动细菌感染风险的独特特征,包括载脂蛋白A1、补体C3、D-二聚体、C反应蛋白(CRP)、总胆红素(TBIL)和国际标准化比值(INR),这些特征反映了炎症、凝血和肝功能障碍标志物的协同作用。XGBoost的亚组分析发现,XGBoost在合并腹水的HBV-ACLF患者和合并全身炎症反应综合征(SIRS)的HBV-ACLF患者中也具有良好的诊断性能。
基于XGBoost方法结合六项检测指标构建了HBV-ACLF细菌感染诊断模型,有助于HBV-ACLF患者细菌感染的早期临床诊断。