用于预测具有肺弥散能力障碍的康复期 COVID-19 患者的可解释机器学习模型。

Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment.

机构信息

Hubei University of Chinese Medicine, Wuhan, 430065, China.

Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, 430061, China.

出版信息

BMC Med Inform Decis Mak. 2023 Aug 29;23(1):169. doi: 10.1186/s12911-023-02192-6.

DOI:10.1186/s12911-023-02192-6

PMID:37644543

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10466769/

Abstract

INTRODUCTION

The COVID-19 patients in the convalescent stage noticeably have pulmonary diffusing capacity impairment (PDCI). The pulmonary diffusing capacity is a frequently-used indicator of the COVID-19 survivors' prognosis of pulmonary function, but the current studies focusing on prediction of the pulmonary diffusing capacity of these people are limited. The aim of this study was to develop and validate a machine learning (ML) model for predicting PDCI in the COVID-19 patients using routinely available clinical data, thus assisting the clinical diagnosis.

METHODS

Collected from a follow-up study from August to September 2021 of 221 hospitalized survivors of COVID-19 18 months after discharge from Wuhan, including the demographic characteristics and clinical examination, the data in this study were randomly separated into a training (80%) data set and a validation (20%) data set. Six popular machine learning models were developed to predict the pulmonary diffusing capacity of patients infected with COVID-19 in the recovery stage. The performance indicators of the model included area under the curve (AUC), Accuracy, Recall, Precision, Positive Predictive Value(PPV), Negative Predictive Value (NPV) and F1. The model with the optimum performance was defined as the optimal model, which was further employed in the interpretability analysis. The MAHAKIL method was utilized to balance the data and optimize the balance of sample distribution, while the RFECV method for feature selection was utilized to select combined features more favorable to machine learning.

RESULTS

A total of 221 COVID-19 survivors were recruited in this study after discharge from hospitals in Wuhan. Of these participants, 117 (52.94%) were female, with a median age of 58.2 years (standard deviation (SD) = 12). After feature selection, 31 of the 37 clinical factors were finally selected for use in constructing the model. Among the six tested ML models, the best performance was accomplished in the XGBoost model, with an AUC of 0.755 and an accuracy of 78.01% after experimental verification. The SHAPELY Additive explanations (SHAP) summary analysis exhibited that hemoglobin (Hb), maximal voluntary ventilation (MVV), severity of illness, platelet (PLT), Uric Acid (UA) and blood urea nitrogen (BUN) were the top six most important factors affecting the XGBoost model decision-making.

CONCLUSION

The XGBoost model reported here showed a good prognostic prediction ability for PDCI of COVID-19 survivors during the recovery period. Among the interpretation methods based on the importance of SHAP values, Hb and MVV contributed the most to the prediction of PDCI outcomes of COVID-19 survivors in the recovery period.

摘要

简介

新冠康复期患者明显存在肺弥散量损害（PDCI）。肺弥散量是评估新冠幸存者肺功能预后的常用指标，但目前针对这些患者肺弥散量预测的研究有限。本研究旨在使用常规临床数据开发和验证一种机器学习（ML）模型，以预测新冠患者的 PDCI，从而辅助临床诊断。

方法

本研究纳入了 2021 年 8 月至 9 月间武汉 221 名出院后 18 个月的新冠住院幸存者的随访研究数据，包括人口统计学特征和临床检查。本研究数据随机分为训练（80%）数据集和验证（20%）数据集。开发了六种流行的机器学习模型来预测恢复期新冠患者的肺弥散量。模型的性能指标包括曲线下面积（AUC）、准确性、召回率、精确度、阳性预测值（PPV）、阴性预测值（NPV）和 F1。性能最优的模型被定义为最优模型，并进一步进行可解释性分析。采用 MAHAKIL 方法平衡数据并优化样本分布平衡，采用 RFECV 方法进行特征选择，以选择更有利于机器学习的组合特征。

结果

本研究共纳入 221 名出院后在武汉医院接受治疗的新冠幸存者。其中 117 名（52.94%）为女性，中位年龄为 58.2 岁（标准差（SD）=12）。经过特征选择，最终有 37 个临床因素中的 31 个被用于构建模型。在六种测试的 ML 模型中，XGBoost 模型表现最佳，经过实验验证，AUC 为 0.755，准确率为 78.01%。SHAPELY 加性解释（SHAP）总结分析显示，血红蛋白（Hb）、最大自主通气量（MVV）、疾病严重程度、血小板（PLT）、尿酸（UA）和血尿素氮（BUN）是影响 XGBoost 模型决策的前六个最重要因素。

结论

本研究报告的 XGBoost 模型对新冠恢复期患者的 PDCI 具有良好的预后预测能力。在基于 SHAP 值重要性的解释方法中，Hb 和 MVV 对预测新冠恢复期患者 PDCI 结局的贡献最大。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8af/10466769/f967113f55b3/12911_2023_2192_Figa_HTML.jpg

相似文献

Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment.用于预测具有肺弥散能力障碍的康复期 COVID-19 患者的可解释机器学习模型。

BMC Med Inform Decis Mak. 2023 Aug 29;23(1):169. doi: 10.1186/s12911-023-02192-6.

Predicting sepsis in-hospital mortality with machine learning: a multi-center study using clinical and inflammatory biomarkers.基于临床和炎症生物标志物的机器学习预测院内脓毒症死亡率：一项多中心研究。

Eur J Med Res. 2024 Mar 6;29(1):156. doi: 10.1186/s40001-024-01756-0.

Interpretable machine learning model for early prediction of 28-day mortality in ICU patients with sepsis-induced coagulopathy: development and validation.用于脓毒症诱导性凝血病 ICU 患者 28 天死亡率早期预测的可解释机器学习模型：开发与验证。

Eur J Med Res. 2024 Jan 3;29(1):14. doi: 10.1186/s40001-023-01593-7.

Interpretable machine learning for predicting 28-day all-cause in-hospital mortality for hypertensive ischemic or hemorrhagic stroke patients in the ICU: a multi-center retrospective cohort study with internal and external cross-validation.用于预测重症监护病房中高血压性缺血性或出血性中风患者28天全因院内死亡率的可解释机器学习：一项具有内部和外部交叉验证的多中心回顾性队列研究

Front Neurol. 2023 Aug 8;14:1185447. doi: 10.3389/fneur.2023.1185447. eCollection 2023.

Explainable Machine-Learning Model for Prediction of In-Hospital Mortality in Septic Patients Requiring Intensive Care Unit Readmission.用于预测需要再次入住重症监护病房的脓毒症患者院内死亡率的可解释机器学习模型

Infect Dis Ther. 2022 Aug;11(4):1695-1713. doi: 10.1007/s40121-022-00671-3. Epub 2022 Jul 14.

Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation.通过机器学习方法对重症监护病房中新冠肺炎的预后评估：模型开发与验证

J Med Internet Res. 2020 Nov 11;22(11):e23128. doi: 10.2196/23128.

Interpretable machine learning for 28-day all-cause in-hospital mortality prediction in critically ill patients with heart failure combined with hypertension: A retrospective cohort study based on medical information mart for intensive care database-IV and eICU databases.用于预测心力衰竭合并高血压重症患者28天全因院内死亡率的可解释机器学习：一项基于重症监护医学信息集市数据库-IV和电子重症监护病房数据库的回顾性队列研究

Front Cardiovasc Med. 2022 Oct 12;9:994359. doi: 10.3389/fcvm.2022.994359. eCollection 2022.

Learning From Past Respiratory Infections to Predict COVID-19 Outcomes: Retrospective Study.从既往呼吸道感染预测 COVID-19 结局：回顾性研究。

J Med Internet Res. 2021 Feb 22;23(2):e23026. doi: 10.2196/23026.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study.利用可解释机器学习模型预测重症监护病房心力衰竭患者的死亡率：回顾性队列研究。

J Med Internet Res. 2022 Aug 9;24(8):e38082. doi: 10.2196/38082.

引用本文的文献

Machine Learning Based Multi-Parameter Modeling for Prediction of Post-Inflammatory Lung Changes.基于机器学习的多参数建模用于预测炎症后肺部变化

Diagnostics (Basel). 2025 Mar 20;15(6):783. doi: 10.3390/diagnostics15060783.

本文引用的文献

Infect Dis Ther. 2022 Aug;11(4):1695-1713. doi: 10.1007/s40121-022-00671-3. Epub 2022 Jul 14.

Health outcomes in people 2 years after surviving hospitalisation with COVID-19: a longitudinal cohort study.COVID-19 住院幸存者 2 年后的健康结局：一项纵向队列研究。

Lancet Respir Med. 2022 Sep;10(9):863-876. doi: 10.1016/S2213-2600(22)00126-6. Epub 2022 May 11.

Explainability and artificial intelligence in medicine.医学中的可解释性与人工智能

Lancet Digit Health. 2022 Apr;4(4):e214-e215. doi: 10.1016/S2589-7500(22)00029-2.

Incidence and affecting factors of pulmonary diffusing capacity impairment with COVID-19 survivors 18 months after discharge in Wuhan, China.中国武汉新冠康复者出院18个月后肺弥散功能障碍的发生率及影响因素

J Infect. 2022 Feb;84(2):e16-e18. doi: 10.1016/j.jinf.2021.12.020. Epub 2021 Dec 25.

What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.什么造就了良好的预测？特征重要性以及开启遗传学中机器学习的黑箱。

Hum Genet. 2022 Sep;141(9):1515-1528. doi: 10.1007/s00439-021-02402-z. Epub 2021 Dec 4.

Believing in black boxes: machine learning for healthcare does not need explainability to be evidence-based.相信黑盒：医疗保健的机器学习不需要可解释性即可成为基于证据的。

J Clin Epidemiol. 2022 Feb;142:252-257. doi: 10.1016/j.jclinepi.2021.11.001. Epub 2021 Nov 5.

1-year outcomes in hospital survivors with COVID-19: a longitudinal cohort study.COVID-19 住院幸存者 1 年结局：一项纵向队列研究。

Lancet. 2021 Aug 28;398(10302):747-758. doi: 10.1016/S0140-6736(21)01755-4.

Distinct phenotypes of platelet, monocyte, and neutrophil activation occur during the acute and convalescent phase of COVID-19.在 COVID-19 的急性期和恢复期，血小板、单核细胞和中性粒细胞的激活表现出不同的表型。

Platelets. 2021 Nov 17;32(8):1092-1102. doi: 10.1080/09537104.2021.1921721. Epub 2021 May 17.

Epidemiology and organ specific sequelae of post-acute COVID19: A narrative review.急性新冠病毒感染后疾病的流行病学和器官特异性后遗症：叙述性综述。

J Infect. 2021 Jul;83(1):1-16. doi: 10.1016/j.jinf.2021.05.004. Epub 2021 May 14.

3-month, 6-month, 9-month, and 12-month respiratory outcomes in patients following COVID-19-related hospitalisation: a prospective study.COVID-19 相关住院患者的 3 个月、6 个月、9 个月和 12 个月呼吸结局：一项前瞻性研究。

Lancet Respir Med. 2021 Jul;9(7):747-754. doi: 10.1016/S2213-2600(21)00174-0. Epub 2021 May 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于预测具有肺弥散能力障碍的康复期 COVID-19 患者的可解释机器学习模型。

Interpretable machine-learning model for Predicting the Convalescent COVID-19 patients with pulmonary diffusing capacity impairment.

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

简介

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献