Suppr超能文献

对 SARS-CoV-2 基因组进行无监督聚类分析反映了其地理进展,并确定了 SARS-CoV-2 病毒的不同遗传亚群。

Unsupervised cluster analysis of SARS-CoV-2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS-CoV-2 virus.

机构信息

Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA.

Department of Medical Consilience, Graduate School, Dankook University, Yongin-si, South Korea.

出版信息

Genet Epidemiol. 2021 Apr;45(3):316-323. doi: 10.1002/gepi.22373. Epub 2021 Jan 8.

Abstract

Over 10,000 viral genome sequences of the SARS-CoV-2virus have been made readily available during the ongoing coronavirus pandemic since the initial genome sequence of the virus was released on the open access Virological website (http://virological.org/) early on January 11. We utilize the published data on the single stranded RNAs of 11,132 SARS-CoV-2 patients in the GISAID database, which contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Among many important research questions which are currently being investigated, one aspect pertains to the genetic characterization/classification of the virus. We analyze data on the nucleotide sequencing of the virus and geographic information of a subset of 7640 SARS-CoV-2 patients without missing entries that are available in the GISAID database. Instead of modeling the mutation rate, applying phylogenetic tree approaches, and so forth, we here utilize a model-free clustering approach that compares the viruses at a genome-wide level. We apply principal component analysis to a similarity matrix that compares all pairs of these SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index. Our analysis results of the SARS-CoV-2 genome data illustrates the geographic and chronological progression of the virus, starting from the first cases that were observed in China to the current wave of cases in Europe and North America. This is in line with a phylogenetic analysis which we use to contrast our results. We also observe that, based on their sequence data, the SARS-CoV-2 viruses cluster in distinct genetic subgroups. It is the subject of ongoing research to examine whether the genetic subgroup could be related to diseases outcome and its potential implications for vaccine development.

摘要

自 1 月 11 日病毒的初始基因组序列在开放获取的病毒学网站(http://virological.org/)上发布以来,在当前的冠状病毒大流行期间,已经有超过 10000 个 SARS-CoV-2 病毒的基因组序列可供使用。我们利用 GISAID 数据库中 11132 名 SARS-CoV-2 患者的单链 RNA 发表数据,该数据库包含来自世界各地实验室的完全或部分测序的 SARS-CoV-2 样本。在目前正在研究的许多重要研究问题中,有一个方面涉及到病毒的遗传特征/分类。我们分析了 GISAID 数据库中可用的 7640 名 SARS-CoV-2 患者的病毒核苷酸测序和地理信息的子集数据,这些数据没有缺失项。我们没有采用建模突变率、应用系统发育树方法等方法,而是利用一种无模型的聚类方法,在全基因组水平上比较病毒。我们使用杰卡德指数,对一个相似性矩阵应用主成分分析,该矩阵同时比较所有这些 SARS-CoV-2 核苷酸序列在所有基因座的所有对。我们对 SARS-CoV-2 基因组数据的分析结果说明了病毒的地理和时间进展,从在中国首次观察到的病例到目前在欧洲和北美的病例浪潮。这与我们使用的系统发育分析一致,我们用它来对比我们的结果。我们还观察到,根据他们的序列数据,SARS-CoV-2 病毒聚类在不同的遗传亚群中。正在进行研究以检查遗传亚群是否与疾病结果有关,以及其对疫苗开发的潜在影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4758/8005425/90d15d493003/nihms-1650988-f0001.jpg

相似文献

引用本文的文献

本文引用的文献

3
Genotype and phenotype of COVID-19: Their roles in pathogenesis.新型冠状病毒肺炎的基因型和表型:在发病机制中的作用。
J Microbiol Immunol Infect. 2021 Apr;54(2):159-163. doi: 10.1016/j.jmii.2020.03.022. Epub 2020 Mar 31.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验