Suppr超能文献

一种用于预测学龄儿童重症肺炎支原体肺炎的机器学习模型。

A machine learning model for predicting severe mycoplasma pneumoniae pneumonia in school-aged children.

作者信息

Ye Yingying, Gao Zhenpeng, Zhang Zhiling, Chen Jianlong, Chu Chu, Zhou Weifang

机构信息

Department of Infectious Diseases, Children's Hospital of Soochow University, No. 303, Jingde Road, Suzhou, China.

出版信息

BMC Infect Dis. 2025 Apr 21;25(1):570. doi: 10.1186/s12879-025-10958-8.

Abstract

OBJECTIVE

To develop an interpretable machine learning (ML) model for predicting severe Mycoplasma pneumoniae pneumonia (SMPP) in order to provide reliable factors for predicting the clinical type of the disease.

METHODS

We collected clinical data from 483 school-aged children with M. pneumoniae pneumonia (MPP) who were hospitalized at the Children's Hospital of Soochow University between September 2021 and June 2024. Difference analysis and univariate logistic regression were employed to identify predictors for training features in ML. Eight ML algorithms were used to build models based on the selected features, and their effectiveness was validated. The area under the curve (AUC), accuracy, five-fold cross-validation, and decision curve analysis (DCA) were utilized to evaluate model performance. Finally, the best-performing ML model was selected, and the Shapley Additive Explanations (SHAP) method was applied to rank the importance of clinical features and interpret the final model.

RESULTS

After feature selection, 30 variables remained. We constructed eight ML models and assessed their effectiveness, finding that the CatBoost model exhibited the best predictive performance, with an AUC of 0.934 and an accuracy of 0.9175. DCA was used to compare the clinical benefits of the models, revealing that the CatBoost model provided greater net benefits than the other ML models within the threshold probability range of 34% to 75%. Additionally, we applied the SHAP method to interpret the CatBoost model, and the SHAP diagram was used to visually show the influence of predictor variables on the outcome. The results identified the top six risk factors as the number of days with fever, D-dimer, platelet count (PLT), C-reactive protein (CRP), lactate dehydrogenase (LDH), and the neutrophil-to-lymphocyte ratio (NLR).

CONCLUSIONS

The interpretable CatBoost model can help physicians accurately identify school-aged children with SMPP. This early identification facilitates better treatment options and timely prevention of complications. Furthermore, the SHAP algorithm enhances the model's transparency and increases its trustworthiness in practical applications.

摘要

目的

开发一种可解释的机器学习(ML)模型,用于预测重症肺炎支原体肺炎(SMPP),以便为预测该疾病的临床类型提供可靠因素。

方法

我们收集了2021年9月至2024年6月期间在苏州大学附属儿童医院住院的483例学龄期肺炎支原体肺炎(MPP)患儿的临床资料。采用差异分析和单因素逻辑回归来确定ML训练特征的预测因子。基于选定的特征,使用八种ML算法构建模型,并对其有效性进行验证。利用曲线下面积(AUC)、准确率、五折交叉验证和决策曲线分析(DCA)来评估模型性能。最后,选择性能最佳的ML模型,并应用Shapley加性解释(SHAP)方法对临床特征的重要性进行排序,并解释最终模型。

结果

特征选择后,保留了30个变量。我们构建了八个ML模型并评估其有效性,发现CatBoost模型表现出最佳的预测性能,AUC为0.934,准确率为0.9175。使用DCA比较模型的临床益处,发现在34%至75%的阈值概率范围内,CatBoost模型比其他ML模型提供了更大的净益处。此外,我们应用SHAP方法解释CatBoost模型,并使用SHAP图直观地显示预测变量对结果的影响。结果确定前六个风险因素为发热天数、D-二聚体、血小板计数(PLT)、C反应蛋白(CRP)、乳酸脱氢酶(LDH)和中性粒细胞与淋巴细胞比值(NLR)。

结论

可解释的CatBoost模型可以帮助医生准确识别患有SMPP的学龄儿童。这种早期识别有助于选择更好的治疗方案并及时预防并发症。此外,SHAP算法提高了模型的透明度,并增加了其在实际应用中的可信度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2ea7/12013137/3ba8134cc92c/12879_2025_10958_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验