Mohammadi-Pirouz Zahra, Hajian-Tilaki Karimollah, Sadeghi Haddat-Zavareh Mahmoud, Amoozadeh Abazar, Bahrami Shabnam
Student Research Center, Research Institute, Babol University of Medical Sciences, Babol, Iran.
Department of Biostatistics and Epidemiology, School of Public Health, Babol University of Medical Sciences, Babol, Iran.
Int J Emerg Med. 2024 Sep 27;17(1):126. doi: 10.1186/s12245-024-00681-7.
The accurate prediction of COVID-19 mortality risk, considering influencing factors, is crucial in guiding effective public policies to alleviate the strain on the healthcare system. As such, this study aimed to assess the efficacy of decision tree algorithms (CART, C5.0, and CHAID) in predicting COVID-19 mortality risk and compare their performance with that of the logistic model.
This retrospective cohort study examined 5080 cases of COVID-19 in Babol, a city in northern Iran, who tested positive for the virus via PCR from March 2020 to March 2022. In order to check the validity of the findings, the data was randomly divided into an 80% training set and a 20% testing set. The prediction models, such as Logistic regression models and decision tree algorithms, were trained on the 80% training data and tested on the 20% testing data. The accuracy of these methods for the test samples was assessed using measures like ROC curve, sensitivity, specificity, and AUC.
The findings revealed that the mortality rate for COVID-19 patients who were admitted to hospitals was 7.7%. Through cross validation, it was determined that the CHAID algorithm outperformed other decision tree and logistic regression algorithms in specificity, and precision but not sensitivity in predicting the risk of COVID-19 mortality. The CHAID algorithm demonstrated a specificity, precision, accuracy, and F-score of 0.98, 0.70, 0.95, and 0.52 respectively. All models indicated that factors such as ICU hospitalization, intubation, age, kidney disease, BUN, CRP, WBC, NLR, O2 sat, and hemoglobin were among the factors that influenced the mortality rate of COVID-19 patients.
The CART and C5.0 models had outperformed in sensitivity but CHAID demonstrates a better performance compared to other decision tree algorithms in specificity, precision, accuracy and shows a slight improvement over the logistic regression method in predicting the risk of COVID-19 mortality in the population under study.
考虑到影响因素,准确预测新冠病毒疾病(COVID-19)的死亡风险对于指导有效的公共政策以减轻医疗系统的压力至关重要。因此,本研究旨在评估决策树算法(分类与回归树算法(CART)、C5.0算法和卡方自动相互作用检测算法(CHAID))在预测COVID-19死亡风险方面的有效性,并将它们的性能与逻辑模型进行比较。
这项回顾性队列研究调查了伊朗北部城市巴博勒的5080例COVID-19病例,这些病例在2020年3月至2022年3月期间通过聚合酶链反应(PCR)检测出病毒呈阳性。为了检验研究结果的有效性,数据被随机分为80%的训练集和20%的测试集。预测模型,如逻辑回归模型和决策树算法,在80%的训练数据上进行训练,并在20%的测试数据上进行测试。使用受试者工作特征曲线(ROC曲线)、灵敏度、特异性和曲线下面积(AUC)等指标评估这些方法对测试样本的准确性。
研究结果显示,入院的COVID-19患者死亡率为7.7%。通过交叉验证确定,在预测COVID-19死亡风险方面,CHAID算法在特异性和精确度方面优于其他决策树和逻辑回归算法,但在灵敏度方面并非如此。CHAID算法的特异性、精确度、准确度和F值分别为0.98、0.70、0.95和0.52。所有模型均表明,入住重症监护病房(ICU)、插管、年龄、肾脏疾病、血尿素氮(BUN)、C反应蛋白(CRP)、白细胞(WBC)、中性粒细胞与淋巴细胞比值(NLR)、血氧饱和度(O2 sat)和血红蛋白等因素是影响COVID-19患者死亡率的因素。
CART和C5.0模型在灵敏度方面表现更优,但与其他决策树算法相比,CHAID在特异性、精确度和准确度方面表现更好,并且在预测本研究人群中COVID-19死亡风险方面比逻辑回归方法略有改进。