El Boujnouni Hamoucha, Rahouti Mohamed, El Boujnouni Mohamed
Research Center of Plant and Microbial Biotechnologies, Biodiversity, and Environment, Faculty of Sciences, Mohammed V University in Rabat, PO Box 1014, Morocco.
Laboratory of Information Technologies, National School of Applied Sciences, Chouaib Doukkali University in El Jadida, PO Box 1166, Morocco.
Inform Med Unlocked. 2021;24:100577. doi: 10.1016/j.imu.2021.100577. Epub 2021 Apr 20.
COVID-19 is an infectious disease caused by the newly discovered SARS-CoV-2 virus. This virus causes a respiratory tract infection, symptoms include dry cough, fever, tiredness and in more severe cases, breathing difficulty. SARS-CoV-2 is an extremely contagious virus that is spreading rapidly all over the world and the scientific community is working tirelessly to find an effective treatment. This paper aims to determine the origin of this virus by comparing its nucleic acid sequence with all members of the coronaviridae family. This study uses a new approach based on the combination of three powerful techniques which are: Ngrams (For text categorization), Principal Component Analysis (For dimensionality reduction) and Random Forest algorithm (For supervised classification). The experimental results have shown that a large set of SARS-CoV-2 genomes, collected from different locations around the world, present significant similarities to those found in pangolins. This finding confirms some previous results obtained by other methods, which also suggest that pangolins should be considered as possible hosts in the emergence of the new coronavirus.
新冠病毒病是一种由新发现的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引起的传染病。这种病毒会引发呼吸道感染,症状包括干咳、发烧、疲倦,在更严重的情况下会出现呼吸困难。SARS-CoV-2是一种极具传染性的病毒,正在全球迅速传播,科学界正在不懈努力寻找有效的治疗方法。本文旨在通过将该病毒的核酸序列与冠状病毒科的所有成员进行比较来确定其起源。本研究采用了一种基于三种强大技术相结合的新方法,这三种技术分别是:N元语法(用于文本分类)、主成分分析(用于降维)和随机森林算法(用于监督分类)。实验结果表明,从世界各地不同地点收集的大量SARS-CoV-2基因组与穿山甲体内发现的基因组存在显著相似性。这一发现证实了其他方法先前获得的一些结果,这些结果也表明穿山甲应被视为新冠病毒出现的可能宿主。