Hubei University of Chinese Medicine, Wuhan, 430065, China.
Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, 430061, China.
BMC Med Inform Decis Mak. 2023 Aug 29;23(1):169. doi: 10.1186/s12911-023-02192-6.
The COVID-19 patients in the convalescent stage noticeably have pulmonary diffusing capacity impairment (PDCI). The pulmonary diffusing capacity is a frequently-used indicator of the COVID-19 survivors' prognosis of pulmonary function, but the current studies focusing on prediction of the pulmonary diffusing capacity of these people are limited. The aim of this study was to develop and validate a machine learning (ML) model for predicting PDCI in the COVID-19 patients using routinely available clinical data, thus assisting the clinical diagnosis.
Collected from a follow-up study from August to September 2021 of 221 hospitalized survivors of COVID-19 18 months after discharge from Wuhan, including the demographic characteristics and clinical examination, the data in this study were randomly separated into a training (80%) data set and a validation (20%) data set. Six popular machine learning models were developed to predict the pulmonary diffusing capacity of patients infected with COVID-19 in the recovery stage. The performance indicators of the model included area under the curve (AUC), Accuracy, Recall, Precision, Positive Predictive Value(PPV), Negative Predictive Value (NPV) and F1. The model with the optimum performance was defined as the optimal model, which was further employed in the interpretability analysis. The MAHAKIL method was utilized to balance the data and optimize the balance of sample distribution, while the RFECV method for feature selection was utilized to select combined features more favorable to machine learning.
A total of 221 COVID-19 survivors were recruited in this study after discharge from hospitals in Wuhan. Of these participants, 117 (52.94%) were female, with a median age of 58.2 years (standard deviation (SD) = 12). After feature selection, 31 of the 37 clinical factors were finally selected for use in constructing the model. Among the six tested ML models, the best performance was accomplished in the XGBoost model, with an AUC of 0.755 and an accuracy of 78.01% after experimental verification. The SHAPELY Additive explanations (SHAP) summary analysis exhibited that hemoglobin (Hb), maximal voluntary ventilation (MVV), severity of illness, platelet (PLT), Uric Acid (UA) and blood urea nitrogen (BUN) were the top six most important factors affecting the XGBoost model decision-making.
The XGBoost model reported here showed a good prognostic prediction ability for PDCI of COVID-19 survivors during the recovery period. Among the interpretation methods based on the importance of SHAP values, Hb and MVV contributed the most to the prediction of PDCI outcomes of COVID-19 survivors in the recovery period.
新冠康复期患者明显存在肺弥散量损害(PDCI)。肺弥散量是评估新冠幸存者肺功能预后的常用指标,但目前针对这些患者肺弥散量预测的研究有限。本研究旨在使用常规临床数据开发和验证一种机器学习(ML)模型,以预测新冠患者的 PDCI,从而辅助临床诊断。
本研究纳入了 2021 年 8 月至 9 月间武汉 221 名出院后 18 个月的新冠住院幸存者的随访研究数据,包括人口统计学特征和临床检查。本研究数据随机分为训练(80%)数据集和验证(20%)数据集。开发了六种流行的机器学习模型来预测恢复期新冠患者的肺弥散量。模型的性能指标包括曲线下面积(AUC)、准确性、召回率、精确度、阳性预测值(PPV)、阴性预测值(NPV)和 F1。性能最优的模型被定义为最优模型,并进一步进行可解释性分析。采用 MAHAKIL 方法平衡数据并优化样本分布平衡,采用 RFECV 方法进行特征选择,以选择更有利于机器学习的组合特征。
本研究共纳入 221 名出院后在武汉医院接受治疗的新冠幸存者。其中 117 名(52.94%)为女性,中位年龄为 58.2 岁(标准差(SD)=12)。经过特征选择,最终有 37 个临床因素中的 31 个被用于构建模型。在六种测试的 ML 模型中,XGBoost 模型表现最佳,经过实验验证,AUC 为 0.755,准确率为 78.01%。SHAPELY 加性解释(SHAP)总结分析显示,血红蛋白(Hb)、最大自主通气量(MVV)、疾病严重程度、血小板(PLT)、尿酸(UA)和血尿素氮(BUN)是影响 XGBoost 模型决策的前六个最重要因素。
本研究报告的 XGBoost 模型对新冠恢复期患者的 PDCI 具有良好的预后预测能力。在基于 SHAP 值重要性的解释方法中,Hb 和 MVV 对预测新冠恢复期患者 PDCI 结局的贡献最大。