Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
Modeling of Noncommunicable Diseases Research Center, School of Public Health, Hamadan University of Medical Sciences, Street of Shahid Fahmideh, P.O. BOX: 6517838736, Hamadan, Iran.
BMC Med Inform Decis Mak. 2022 Jul 24;22(1):192. doi: 10.1186/s12911-022-01939-x.
Due to the high mortality of COVID-19 patients, the use of a high-precision classification model of patient's mortality that is also interpretable, could help reduce mortality and take appropriate action urgently. In this study, the random forest method was used to select the effective features in COVID-19 mortality and the classification was performed using logistic model tree (LMT), classification and regression tree (CART), C4.5, and C5.0 tree based on important features.
In this retrospective study, the data of 2470 COVID-19 patients admitted to hospitals in Hamadan, west Iran, were used, of which 75.02% recovered and 24.98% died. To classify, at first among the 25 demographic, clinical, and laboratory findings, features with a relative importance more than 6% were selected by random forest. Then LMT, C4.5, C5.0, and CART trees were developed and the accuracy of classification performance was evaluated with recall, accuracy, and F1-score criteria for training, test, and total datasets. At last, the best tree was developed and the receiver operating characteristic curve and area under the curve (AUC) value were reported.
The results of this study showed that among demographic and clinical features gender and age, and among laboratory findings blood urea nitrogen, partial thromboplastin time, serum glutamic-oxaloacetic transaminase, and erythrocyte sedimentation rate had more than 6% relative importance. Developing the trees using the above features revealed that the CART with the values of F1-score, Accuracy, and Recall, 0.8681, 0.7824, and 0.955, respectively, for the test dataset and 0.8667, 0.7834, and 0.9385, respectively, for the total dataset had the best performance. The AUC value obtained for the CART was 79.5%.
Finding a highly accurate and qualified model for interpreting the classification of a response that is considered clinically consequential is critical at all stages, including treatment and immediate decision making. In this study, the CART with its high accuracy for diagnosing and classifying mortality of COVID-19 patients as well as prioritizing important demographic, clinical, and laboratory findings in an interpretable format, risk factors for prognosis of COVID-19 patients mortality identify and enable immediate and appropriate decisions for health professionals and physicians.
由于 COVID-19 患者的死亡率很高,因此使用高精度的患者死亡率分类模型,且该模型还具有可解释性,这有助于降低死亡率并紧急采取适当措施。在这项研究中,使用随机森林方法选择 COVID-19 死亡率的有效特征,并使用逻辑模型树(LMT)、分类回归树(CART)、C4.5 和 C5.0 树基于重要特征进行分类。
在这项回顾性研究中,使用了来自伊朗西部哈马丹医院的 2470 名 COVID-19 患者的数据,其中 75.02%的患者康复,24.98%的患者死亡。为了进行分类,首先在 25 项人口统计学、临床和实验室发现中,随机森林选择了相对重要性超过 6%的特征。然后,开发了 LMT、C4.5、C5.0 和 CART 树,并使用召回率、准确性和 F1 评分标准评估了训练、测试和总数据集的分类性能。最后,开发了最佳树,并报告了接收者操作特征曲线和曲线下面积(AUC)值。
这项研究的结果表明,在人口统计学和临床特征中,性别和年龄,以及在实验室发现中,血尿素氮、部分凝血活酶时间、血清谷氨酸-草酰乙酸转氨酶和红细胞沉降率具有超过 6%的相对重要性。使用上述特征开发的树表明,CART 的 F1 评分、准确性和召回率在测试数据集分别为 0.8681、0.7824 和 0.955,在总数据集分别为 0.8667、0.7834 和 0.9385,表现最佳。CART 的 AUC 值为 79.5%。
在包括治疗和立即决策在内的所有阶段,找到一个高度准确和合格的模型来解释被认为具有临床意义的反应分类是至关重要的。在这项研究中,CART 能够以可解释的格式准确诊断和分类 COVID-19 患者的死亡率,并优先考虑重要的人口统计学、临床和实验室发现,确定 COVID-19 患者死亡率的预后风险因素,并为卫生专业人员和医生提供即时和适当的决策。