Gharehhasani Bita Shokri, Rezaei Mansour, Naghipour Armin, Sayad Nazanine, Mostafaei Shayan, Alimohammadi Ehsan
Taleghani Hospital Kermanshah University of Medical Sciences Kermanshah Iran.
Social Development and Health Promotion Research Center Kermanshah University of Medical Sciences Kermanshah Iran.
Health Sci Rep. 2024 Jul 25;7(7):e2266. doi: 10.1002/hsr2.2266. eCollection 2024 Jul.
Death due to covid-19 is one of the biggest health challenges in the world. There are many models that can predict death due to COVID-19. This study aimed to fit and compare Decision Tree (DT), Support Vector Machine (SVM), and AdaBoost models to predict death due to COVID-19.
To describe the variables, mean (SD) and frequency (%) were reported. To determine the relationship between the variables and the death caused by COVID-19, chi-square test was performed with a significance level of 0.05. To compare DT, SVM and AdaBoost models for predicting death due to COVID-19 from sensitivity, specificity, accuracy and the area under the rock curve under R software using psych, caTools, random over-sampling examples, rpart, rpartplot packages was done.
Out of the total of 23,054 patients studied, 10,935 cases (46.5%) were women, and 12,569 cases (53.5%) were men. Additionally, the mean age of the patients was 54.9 ± 21.0 years. There is a statistically significant relationship between gender, fever, cough, muscle pain, smell and taste, abdominal pain, nausea and vomiting, diarrhea, anorexia, dizziness, chest pain, intubation, cancer, diabetes, chronic blood disease, Violation of immunity, pregnancy, Dialysis, chronic lung disease with the death of covid-19 patients showed ( < 0.05). The results showed that the sensitivity, specificity, accuracy and the area under the receiver operating characteristic curve were respectively 0.60, 0.68, 0.71, and 0.75 in the DT model, 0.54, 0.62, 0.63, and 0.71 in the SVM model, and 0.59, 0.65, 0.69 and 0.74 in the AdaBoost model.
The results showed that DT had a high predictive power compared to other data mining models. Therefore, it is suggested to researchers in different fields to use DT to predict the studied variables. Also, it is suggested to use other approaches such as random forest or XGBoost to improve the accuracy in future studies.
新冠病毒肺炎(COVID-19)导致的死亡是全球最大的健康挑战之一。有许多模型可用于预测COVID-19导致的死亡。本研究旨在拟合和比较决策树(DT)、支持向量机(SVM)和AdaBoost模型来预测COVID-19导致的死亡。
为描述变量,报告了均值(标准差)和频率(%)。为确定变量与COVID-19导致的死亡之间的关系,进行了显著性水平为0.05的卡方检验。为在R软件中使用psych、caTools、随机过采样示例、rpart、rpartplot包,从敏感性、特异性、准确性和roc曲线下面积方面比较DT、SVM和AdaBoost模型对COVID-19导致的死亡进行预测。
在总共研究的23054例患者中,10935例(46.5%)为女性,12569例(53.5%)为男性。此外,患者的平均年龄为54.9±21.0岁。性别、发热、咳嗽、肌肉疼痛、嗅觉和味觉、腹痛、恶心和呕吐、腹泻、厌食、头晕、胸痛、插管、癌症、糖尿病、慢性血液疾病、免疫功能受损、妊娠、透析、慢性肺病与COVID-19患者死亡之间存在统计学显著关系(P<0.05)。结果显示,DT模型的敏感性、特异性、准确性和受试者工作特征曲线下面积分别为0.60、0.68、0.71和0.75,SVM模型分别为0.54、0.62、0.63和0.71,AdaBoost模型分别为0.59、0.65、0.69和0.74。
结果表明,与其他数据挖掘模型相比,DT具有较高的预测能力。因此,建议不同领域的研究人员使用DT来预测所研究的变量。此外,建议在未来研究中使用其他方法,如随机森林或XGBoost来提高准确性。