Suppr超能文献

严重急性呼吸综合征冠状病毒2(SARS-CoV-2)起源的鉴定:使用词频统计、主成分分析和随机森林算法

Identification of SARS-CoV-2 origin: Using Ngrams, principal component analysis and Random Forest algorithm.

作者信息

El Boujnouni Hamoucha, Rahouti Mohamed, El Boujnouni Mohamed

机构信息

Research Center of Plant and Microbial Biotechnologies, Biodiversity, and Environment, Faculty of Sciences, Mohammed V University in Rabat, PO Box 1014, Morocco.

Laboratory of Information Technologies, National School of Applied Sciences, Chouaib Doukkali University in El Jadida, PO Box 1166, Morocco.

出版信息

Inform Med Unlocked. 2021;24:100577. doi: 10.1016/j.imu.2021.100577. Epub 2021 Apr 20.

Abstract

COVID-19 is an infectious disease caused by the newly discovered SARS-CoV-2 virus. This virus causes a respiratory tract infection, symptoms include dry cough, fever, tiredness and in more severe cases, breathing difficulty. SARS-CoV-2 is an extremely contagious virus that is spreading rapidly all over the world and the scientific community is working tirelessly to find an effective treatment. This paper aims to determine the origin of this virus by comparing its nucleic acid sequence with all members of the coronaviridae family. This study uses a new approach based on the combination of three powerful techniques which are: Ngrams (For text categorization), Principal Component Analysis (For dimensionality reduction) and Random Forest algorithm (For supervised classification). The experimental results have shown that a large set of SARS-CoV-2 genomes, collected from different locations around the world, present significant similarities to those found in pangolins. This finding confirms some previous results obtained by other methods, which also suggest that pangolins should be considered as possible hosts in the emergence of the new coronavirus.

摘要

新冠病毒病是一种由新发现的严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引起的传染病。这种病毒会引发呼吸道感染,症状包括干咳、发烧、疲倦,在更严重的情况下会出现呼吸困难。SARS-CoV-2是一种极具传染性的病毒,正在全球迅速传播,科学界正在不懈努力寻找有效的治疗方法。本文旨在通过将该病毒的核酸序列与冠状病毒科的所有成员进行比较来确定其起源。本研究采用了一种基于三种强大技术相结合的新方法,这三种技术分别是:N元语法(用于文本分类)、主成分分析(用于降维)和随机森林算法(用于监督分类)。实验结果表明,从世界各地不同地点收集的大量SARS-CoV-2基因组与穿山甲体内发现的基因组存在显著相似性。这一发现证实了其他方法先前获得的一些结果,这些结果也表明穿山甲应被视为新冠病毒出现的可能宿主。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c113/8056990/73e1833c1879/gr1_lrg.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验