Muhammad L J, Islam Md Milon, Usman Sani Sharif, Ayon Safial Islam
Department of Mathematics and Computer Science, Faculty of Science, Federal University of Kashere, P.M.B. 0182, Gombe, Nigeria.
Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203 Bangladesh.
SN Comput Sci. 2020;1(4):206. doi: 10.1007/s42979-020-00216-w. Epub 2020 Jun 21.
Novel coronavirus (COVID-19 or 2019-nCoV) pandemic has neither clinically proven vaccine nor drugs; however, its patients are recovering with the aid of antibiotic medications, anti-viral drugs, and chloroquine as well as vitamin C supplementation. It is now evident that the world needs a speedy and quicker solution to contain and tackle the further spread of COVID-19 across the world with the aid of non-clinical approaches such as data mining approaches, augmented intelligence and other artificial intelligence techniques so as to mitigate the huge burden on the healthcare system while providing the best possible means for patients' diagnosis and prognosis of the 2019-nCoV pandemic effectively. In this study, data mining models were developed for the prediction of COVID-19 infected patients' recovery using epidemiological dataset of COVID-19 patients of South Korea. The decision tree, support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor algorithms were applied directly on the dataset using python programming language to develop the models. The model predicted a minimum and maximum number of days for COVID-19 patients to recover from the virus, the age group of patients who are of high risk not to recover from the COVID-19 pandemic, those who are likely to recover and those who might be likely to recover quickly from COVID-19 pandemic. The results of the present study have shown that the model developed with decision tree data mining algorithm is more efficient to predict the possibility of recovery of the infected patients from COVID-19 pandemic with the overall accuracy of 99.85% which stands to be the best model developed among the models developed with other algorithms including support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor.
新型冠状病毒(COVID-19 或 2019-nCoV)大流行既没有经过临床验证的疫苗,也没有有效的药物;然而,其患者正在借助抗生素药物、抗病毒药物、氯喹以及补充维生素 C 实现康复。现在很明显,世界需要一个快速且更有效的解决方案,借助数据挖掘方法、增强智能和其他人工智能技术等非临床方法来遏制和应对 COVID-19 在全球的进一步传播,从而减轻医疗系统的巨大负担,同时为 2019-nCoV 大流行患者的诊断和预后提供最佳可能手段。在本研究中,利用韩国 COVID-19 患者的流行病学数据集开发了数据挖掘模型,用于预测 COVID-19 感染患者的康复情况。使用 Python 编程语言将决策树、支持向量机、朴素贝叶斯、逻辑回归、随机森林和 K 近邻算法直接应用于该数据集来开发模型。该模型预测了 COVID-19 患者从病毒感染中康复所需的最短和最长天数、在 COVID-19 大流行中不易康复的高风险年龄组患者、可能康复的患者以及可能快速从 COVID-19 大流行中康复的患者。本研究结果表明,用决策树数据挖掘算法开发的模型在预测 COVID-19 大流行感染患者的康复可能性方面更有效,总体准确率为 99.85%,这是在用包括支持向量机、朴素贝叶斯、逻辑回归、随机森林和 K 近邻在内的其他算法开发的模型中表现最佳的模型。