Wang Qiaoli, Liang Tao, Li Yuexi, Zhou Peng, Liu Xiaoqin
Health Management Center, Deyang People's Hospital, Deyang, Sichuan, China.
Department of Gastroenterology, Deyang People's Hospital, Deyang, Sichuan, China.
Front Med (Lausanne). 2025 Jun 13;12:1587540. doi: 10.3389/fmed.2025.1587540. eCollection 2025.
OBJECTIVE: This study aimed to investigate the feasibility of developing machine learning models for non-invasive prediction of () infection using routinely collected adult health screening data, including demographic characteristics and clinical biomarkers, to establish a potential decision-support tool for clinical practice. METHODS: The data was sourced from the adult health examination records within the health management centers of the hospital. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed for feature selection. Six distinct machine learning algorithms were utilized to construct the predictive models, and their performance was comprehensively evaluated. Additionally, the SHapley Additive Projection (SHAP) method was adopted to visualize the model features and the prediction results of individual cases. RESULTS: A total of 10,393 subjects were included in the dataset, with 3,278 (31.54%) having infection. After feature screening, 10 factors were selected for the prediction model. Among six machine-learning models, the Extra Trees model had the best performance, with an AUC of 0.827, Accuracy of 0.744, and Recall of 0.736. The Random Forest model also did well, with an AUC of 0.810. XGBoost attained an AUC of 0.801, indicating moderate predictive capability. SHAP analysis showed that age, WBC, ALB, gender, and wasit were the top five factors affecting infection. Higher age, WBC, wasit and lower ALB were linked to a higher infection probability. These results offer insights into infection risk factors and model performance. CONCLUSION: The Extra Trees classifier exhibited the optimal performance in predicting infections among the evaluated models. Additionally, the SHAP analysis enhanced the interpretability of the model, which offers valuable insights for early-stage clinical prediction and intervention strategies.
目的:本研究旨在探讨利用常规收集的成人健康筛查数据(包括人口统计学特征和临床生物标志物)开发机器学习模型用于无创预测()感染的可行性,以建立一种潜在的临床实践决策支持工具。 方法:数据来源于医院健康管理中心的成人健康检查记录。采用最小绝对收缩和选择算子(LASSO)回归进行特征选择。使用六种不同的机器学习算法构建预测模型,并对其性能进行综合评估。此外,采用SHapley加法投影(SHAP)方法可视化模型特征和个体病例的预测结果。 结果:数据集中共纳入10393名受试者,其中3278名(31.54%)发生()感染。经过特征筛选,为预测模型选择了10个因素。在六个机器学习模型中,极端随机树模型性能最佳,曲线下面积(AUC)为0.827,准确率为0.744,召回率为0.736。随机森林模型也表现良好,AUC为0.810。XGBoost的AUC为0.801,表明具有中等预测能力。SHAP分析显示,年龄、白细胞、白蛋白、性别和腰围是影响()感染的前五个因素。年龄越大、白细胞越高、腰围越大以及白蛋白越低与感染概率越高相关。这些结果为()感染危险因素和模型性能提供了见解。 结论:在评估的模型中,极端随机树分类器在预测()感染方面表现出最佳性能。此外,SHAP分析增强了模型的可解释性,为早期临床预测和干预策略提供了有价值的见解。
Front Med (Lausanne). 2025-6-13
Cochrane Database Syst Rev. 2018-3-15
Cochrane Database Syst Rev. 2016-6-28
J Hematol Oncol. 2025-1-23
Virulence. 2025-12
Asian J Surg. 2024-11-28
World J Gastroenterol. 2024-11-21