Suppr超能文献

基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.

作者信息

Tan Xuefeng, Zhang Xiufang, Chai Jie, Ji Wenjuan, Ru Jinling, Yang Cuilin, Zhou Wenjing, Bai Jing, Xiong Yueling

机构信息

Department of Laboratory Medicine, The People's Hospital, Bozhou, China.

Translational Medicine Center, The Second Affiliated Hospital, Wannan Medical College, Wuhu, China.

出版信息

Transl Pediatr. 2024 Nov 30;13(11):1933-1946. doi: 10.21037/tp-24-278. Epub 2024 Nov 26.

Abstract

BACKGROUND

The clinical characteristics of neonatal sepsis (NS) are subtle and non-specific, posing a serious threat to the lives of newborn infants. Early-onset sepsis (EOS) is sepsis that occurs within 72 hours after birth, with a high mortality rate. Identifying key factors of NS and conducting early diagnosis are of great practical significance. Thus, we developed a robust machine learning (ML) model for the early prediction of EOS in neonates admitted to the neonatal intensive care unit (NICU), investigated the pivotal risk factors associated with EOS development, and provided interpretable insights into the model's predictions.

METHODS

A retrospective cohort study was conducted. This includes 668 newborns (EOS and non-EOS) admitted to the NICU of Bozhou People's Hospital from January to December 2023, excluding 72 newborns born more than three days ago and 166 newborns with medical record data missing more than 30%. Finally, 430 newborns (EOS and non-EOS) were included in the study. Clinical case data were meticulously analyzed, and the dataset was randomly partitioned, allocating 75% for model training and the remaining 25% for test. Data preprocessing was meticulously performed using R language, and the least absolute shrinkage and selection operator (LASSO) regression was implemented to select salient features, mitigating the risk of overfitting. Six ML models were leveraged to forecast the incidence of EOS in neonates. The predictive performance of these models was rigorously evaluated using the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Furthermore, the SHapley Additive exPlanations (SHAP) framework was employed to provide intuitive explanations for the predictions made by the Categorical Boosting (CatBoost) model, which emerged as the top performer.

RESULTS

The ROC area under the curve (ROCAUC) of six ML models, CatBoost, random forest (RF), eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), support vector machine (SVM), logistic regression (LR) all exceeded 0.900 on the test set. Especially the CatBoost model exhibited superior performance, with favorable outcomes in calibration, decision curve analysis (DCA), and learning curves. Notably, the ROCAUC attained 0.975, and the area under the PR curve (PRAUC) reached 0.947, signifying a high degree of predictive accuracy. Utilizing the SHAP method, seven key features were identified and ranked by their importance: respiratory rate (RR), procalcitonin (PCT), nasal congestion (NC), yellow staining (YS), white blood cell count (WBC), fever, and amniotic fluid turbidity (AFT).

CONCLUSIONS

By constructing a precision-oriented ML model and harnessing the SHAP method for interpretability, this study effectively identified crucial risk factors for EOS development in neonates. This approach enables early prediction of EOS risk, thereby facilitating timely and targeted clinical interventions for precise diagnosis and treatment.

摘要

背景

新生儿败血症(NS)的临床特征不明显且无特异性,对新生儿的生命构成严重威胁。早发型败血症(EOS)是指出生后72小时内发生的败血症,死亡率很高。识别新生儿败血症的关键因素并进行早期诊断具有重要的实际意义。因此,我们开发了一个强大的机器学习(ML)模型,用于早期预测入住新生儿重症监护病房(NICU)的新生儿的EOS,研究与EOS发生相关的关键危险因素,并为模型预测提供可解释的见解。

方法

进行了一项回顾性队列研究。这包括2023年1月至12月入住亳州市人民医院NICU的668名新生儿(EOS和非EOS),排除出生超过三天的72名新生儿以及病历数据缺失超过30%的166名新生儿。最后,430名新生儿(EOS和非EOS)被纳入研究。对临床病例数据进行了细致分析,并将数据集随机划分,75%用于模型训练,其余25%用于测试。使用R语言进行了细致的数据预处理,并实施了最小绝对收缩和选择算子(LASSO)回归以选择显著特征,降低过拟合风险。利用六个ML模型预测新生儿EOS的发生率。使用受试者工作特征(ROC)曲线和精确召回率(PR)曲线对这些模型的预测性能进行了严格评估。此外,采用SHapley加性解释(SHAP)框架为表现最佳的分类提升(CatBoost)模型的预测提供直观解释。

结果

六个ML模型,即CatBoost、随机森林(RF)、极端梯度提升(XGBoost)、多层感知器(MLP)、支持向量机(SVM)、逻辑回归(LR)在测试集上的曲线下ROC面积(ROCAUC)均超过0.900。特别是CatBoost模型表现出卓越的性能,在校准、决策曲线分析(DCA)和学习曲线方面都有良好的结果。值得注意的是,ROCAUC达到0.975,PR曲线下面积(PRAUC)达到0.947,表明预测准确性很高。利用SHAP方法,确定了七个关键特征,并按重要性进行了排序:呼吸频率(RR)、降钙素原(PCT)、鼻塞(NC)、黄疸(YS)、白细胞计数(WBC)、发热和羊水浑浊(AFT)。

结论

通过构建以精度为导向的ML模型并利用SHAP方法进行可解释性分析,本研究有效识别了新生儿EOS发生的关键危险因素。这种方法能够早期预测EOS风险,从而便于及时进行有针对性的临床干预,以实现精确诊断和治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d3de/11621883/d2d3a2387d06/tp-13-11-1933-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验