基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.

作者信息

Tan Xuefeng, Zhang Xiufang, Chai Jie, Ji Wenjuan, Ru Jinling, Yang Cuilin, Zhou Wenjing, Bai Jing, Xiong Yueling

机构信息

Department of Laboratory Medicine, The People's Hospital, Bozhou, China.

Translational Medicine Center, The Second Affiliated Hospital, Wannan Medical College, Wuhu, China.

出版信息

Transl Pediatr. 2024 Nov 30;13(11):1933-1946. doi: 10.21037/tp-24-278. Epub 2024 Nov 26.

DOI:10.21037/tp-24-278

PMID:39649648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11621883/

Abstract

BACKGROUND

The clinical characteristics of neonatal sepsis (NS) are subtle and non-specific, posing a serious threat to the lives of newborn infants. Early-onset sepsis (EOS) is sepsis that occurs within 72 hours after birth, with a high mortality rate. Identifying key factors of NS and conducting early diagnosis are of great practical significance. Thus, we developed a robust machine learning (ML) model for the early prediction of EOS in neonates admitted to the neonatal intensive care unit (NICU), investigated the pivotal risk factors associated with EOS development, and provided interpretable insights into the model's predictions.

METHODS

A retrospective cohort study was conducted. This includes 668 newborns (EOS and non-EOS) admitted to the NICU of Bozhou People's Hospital from January to December 2023, excluding 72 newborns born more than three days ago and 166 newborns with medical record data missing more than 30%. Finally, 430 newborns (EOS and non-EOS) were included in the study. Clinical case data were meticulously analyzed, and the dataset was randomly partitioned, allocating 75% for model training and the remaining 25% for test. Data preprocessing was meticulously performed using R language, and the least absolute shrinkage and selection operator (LASSO) regression was implemented to select salient features, mitigating the risk of overfitting. Six ML models were leveraged to forecast the incidence of EOS in neonates. The predictive performance of these models was rigorously evaluated using the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Furthermore, the SHapley Additive exPlanations (SHAP) framework was employed to provide intuitive explanations for the predictions made by the Categorical Boosting (CatBoost) model, which emerged as the top performer.

RESULTS

The ROC area under the curve (ROCAUC) of six ML models, CatBoost, random forest (RF), eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), support vector machine (SVM), logistic regression (LR) all exceeded 0.900 on the test set. Especially the CatBoost model exhibited superior performance, with favorable outcomes in calibration, decision curve analysis (DCA), and learning curves. Notably, the ROCAUC attained 0.975, and the area under the PR curve (PRAUC) reached 0.947, signifying a high degree of predictive accuracy. Utilizing the SHAP method, seven key features were identified and ranked by their importance: respiratory rate (RR), procalcitonin (PCT), nasal congestion (NC), yellow staining (YS), white blood cell count (WBC), fever, and amniotic fluid turbidity (AFT).

CONCLUSIONS

By constructing a precision-oriented ML model and harnessing the SHAP method for interpretability, this study effectively identified crucial risk factors for EOS development in neonates. This approach enables early prediction of EOS risk, thereby facilitating timely and targeted clinical interventions for precise diagnosis and treatment.

摘要

背景

新生儿败血症（NS）的临床特征不明显且无特异性，对新生儿的生命构成严重威胁。早发型败血症（EOS）是指出生后72小时内发生的败血症，死亡率很高。识别新生儿败血症的关键因素并进行早期诊断具有重要的实际意义。因此，我们开发了一个强大的机器学习（ML）模型，用于早期预测入住新生儿重症监护病房（NICU）的新生儿的EOS，研究与EOS发生相关的关键危险因素，并为模型预测提供可解释的见解。

方法

进行了一项回顾性队列研究。这包括2023年1月至12月入住亳州市人民医院NICU的668名新生儿（EOS和非EOS），排除出生超过三天的72名新生儿以及病历数据缺失超过30%的166名新生儿。最后，430名新生儿（EOS和非EOS）被纳入研究。对临床病例数据进行了细致分析，并将数据集随机划分，75%用于模型训练，其余25%用于测试。使用R语言进行了细致的数据预处理，并实施了最小绝对收缩和选择算子（LASSO）回归以选择显著特征，降低过拟合风险。利用六个ML模型预测新生儿EOS的发生率。使用受试者工作特征（ROC）曲线和精确召回率（PR）曲线对这些模型的预测性能进行了严格评估。此外，采用SHapley加性解释（SHAP）框架为表现最佳的分类提升（CatBoost）模型的预测提供直观解释。

结果

六个ML模型，即CatBoost、随机森林（RF）、极端梯度提升（XGBoost）、多层感知器（MLP）、支持向量机（SVM）、逻辑回归（LR）在测试集上的曲线下ROC面积（ROCAUC）均超过0.900。特别是CatBoost模型表现出卓越的性能，在校准、决策曲线分析（DCA）和学习曲线方面都有良好的结果。值得注意的是，ROCAUC达到0.975，PR曲线下面积（PRAUC）达到0.947，表明预测准确性很高。利用SHAP方法，确定了七个关键特征，并按重要性进行了排序：呼吸频率（RR）、降钙素原（PCT）、鼻塞（NC）、黄疸（YS）、白细胞计数（WBC）、发热和羊水浑浊（AFT）。

结论

通过构建以精度为导向的ML模型并利用SHAP方法进行可解释性分析，本研究有效识别了新生儿EOS发生的关键危险因素。这种方法能够早期预测EOS风险，从而便于及时进行有针对性的临床干预，以实现精确诊断和治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3de/11621883/d2d3a2387d06/tp-13-11-1933-f1.jpg

相似文献

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Transl Pediatr. 2024 Nov 30;13(11):1933-1946. doi: 10.21037/tp-24-278. Epub 2024 Nov 26.

Early prediction of sepsis associated encephalopathy in elderly ICU patients using machine learning models: a retrospective study based on the MIMIC-IV database.使用机器学习模型对老年重症监护病房患者脓毒症相关脑病进行早期预测：一项基于MIMIC-IV数据库的回顾性研究

Front Cell Infect Microbiol. 2025 Apr 17;15:1545979. doi: 10.3389/fcimb.2025.1545979. eCollection 2025.

Prediction of sepsis mortality in ICU patients using machine learning methods.使用机器学习方法预测 ICU 患者的败血症死亡率。

BMC Med Inform Decis Mak. 2024 Aug 16;24(1):228. doi: 10.1186/s12911-024-02630-z.

Machine learning-based predictive models for perioperative major adverse cardiovascular events in patients with stable coronary artery disease undergoing noncardiac surgery.基于机器学习的预测模型用于接受非心脏手术的稳定冠状动脉疾病患者围手术期主要不良心血管事件的预测

Comput Methods Programs Biomed. 2025 Mar;260:108561. doi: 10.1016/j.cmpb.2024.108561. Epub 2024 Dec 13.

Prediction of STAS in lung adenocarcinoma with nodules ≤ 2 cm using machine learning: a multicenter retrospective study.使用机器学习预测直径≤2 cm的肺腺癌中的STAS：一项多中心回顾性研究

BMC Cancer. 2025 Mar 7;25(1):417. doi: 10.1186/s12885-025-13783-z.

Clinical decision support systems for 3-month mortality in elderly patients admitted to ICU with ischemic stroke using interpretable machine learning.使用可解释机器学习的针对入住重症监护病房的老年缺血性中风患者3个月死亡率的临床决策支持系统

Digit Health. 2024 Sep 17;10:20552076241280126. doi: 10.1177/20552076241280126. eCollection 2024 Jan-Dec.

Development and validation of an interpretable machine learning model for predicting in-hospital mortality for ischemic stroke patients in ICU.用于预测ICU中缺血性中风患者院内死亡率的可解释机器学习模型的开发与验证

Int J Med Inform. 2025 Jun;198:105874. doi: 10.1016/j.ijmedinf.2025.105874. Epub 2025 Mar 9.

Explainable machine learning model for prediction of 28-day all-cause mortality in immunocompromised patients in the intensive care unit: a retrospective cohort study based on MIMIC-IV database.用于预测重症监护病房免疫功能低下患者28天全因死亡率的可解释机器学习模型：一项基于MIMIC-IV数据库的回顾性队列研究

Eur J Med Res. 2025 May 3;30(1):358. doi: 10.1186/s40001-025-02622-3.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

[Construction of a predictive model for in-hospital mortality of sepsis patients in intensive care unit based on machine learning].基于机器学习构建重症监护病房脓毒症患者院内死亡率预测模型

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2023 Jul;35(7):696-701. doi: 10.3760/cma.j.cn121430-20221219-01104.

引用本文的文献

Development of a Predictive Model for Neonatal Hospital-Acquired Gastrointestinal Infections Utilizing Multiple Machine Learning Algorithms.利用多种机器学习算法开发新生儿医院获得性胃肠道感染预测模型

Infect Drug Resist. 2025 Aug 18;18:4141-4156. doi: 10.2147/IDR.S533904. eCollection 2025.

本文引用的文献

A practical predictive model to predict 30-day mortality in neonatal sepsis.一种实用的预测新生儿败血症 30 天死亡率的模型。

Rev Assoc Med Bras (1992). 2024 Aug 16;70(7):e20231561. doi: 10.1590/1806-9282.20231561. eCollection 2024.

Associations between maternal bacteremia during the peripartum period and early-onset neonatal sepsis: a retrospective cohort study.围产期产妇菌血症与早发性新生儿败血症的关联：一项回顾性队列研究。

BMC Pediatr. 2024 Aug 15;24(1):526. doi: 10.1186/s12887-024-04980-z.

16S rDNA Sequencing for Bacterial Identification in Preterm Infants with Suspected Early-Onset Neonatal Sepsis.16S核糖体DNA测序用于疑似早发型新生儿败血症早产儿的细菌鉴定

Trop Med Infect Dis. 2024 Jul 6;9(7):152. doi: 10.3390/tropicalmed9070152.

Early Antibiotic Exposure and Bronchopulmonary Dysplasia in Very Preterm Infants at Low Risk of Early-Onset Sepsis.早期抗生素暴露与低早发性败血症风险的极早产儿支气管肺发育不良。

JAMA Netw Open. 2024 Jun 3;7(6):e2418831. doi: 10.1001/jamanetworkopen.2024.18831.

Exploring factors influencing delayed first antibiotic treatment for suspected early-onset sepsis in preterm newborns: a study before quality improvement initiative.探讨影响疑似早发型败血症早产儿延迟首次抗生素治疗的因素：质量改进措施前的研究。

BMC Pediatr. 2024 Jun 26;24(1):407. doi: 10.1186/s12887-024-04887-9.

Trends in C-Reactive Protein Use in Early-onset Sepsis Evaluations and Associated Antibiotic Use.早期脓毒症评估中 C 反应蛋白使用的趋势及相关抗生素使用。

J Pediatr. 2024 Oct;273:114153. doi: 10.1016/j.jpeds.2024.114153. Epub 2024 Jun 18.

and induce distinct effector γδ T cell responses during neonatal sepsis.并在新生儿败血症期间诱导不同的效应γδ T细胞反应。

iScience. 2024 Apr 5;27(5):109669. doi: 10.1016/j.isci.2024.109669. eCollection 2024 May 17.

Early-onset neonatal sepsis: Effectiveness of classification based on ante- and intrapartum risk factors and clinical monitoring.早发型新生儿败血症：基于产前和产时危险因素及临床监测的分类有效性。

J Gynecol Obstet Hum Reprod. 2024 Jun;53(6):102775. doi: 10.1016/j.jogoh.2024.102775. Epub 2024 Mar 21.

Early detection of late-onset neonatal sepsis from noninvasive biosignals using deep learning: A multicenter prospective development and validation study.使用深度学习技术从无创生物信号中早期检测迟发型新生儿败血症：一项多中心前瞻性研发与验证研究。

Int J Med Inform. 2024 Apr;184:105366. doi: 10.1016/j.ijmedinf.2024.105366. Epub 2024 Feb 4.

Inflammatory biomarkers and physiomarkers of late-onset sepsis and necrotizing enterocolitis in premature infants.早产儿迟发性败血症和坏死性小肠结肠炎的炎症生物标志物和生理标志物

Front Pediatr. 2024 Jan 19;12:1337849. doi: 10.3389/fped.2024.1337849. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献