比较基因组学揭示了 SARS-CoV-2 的早期出现和偏时空分布。

Comparative Genomics Reveals Early Emergence and Biased Spatiotemporal Distribution of SARS-CoV-2.

机构信息

Department of Biosciences, University of Milan, Milan, Italy.

Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, Bari, Italy.

出版信息

Mol Biol Evol. 2021 May 19;38(6):2547-2565. doi: 10.1093/molbev/msab049.

Abstract

Effective systems for the analysis of molecular data are fundamental for monitoring the spread of infectious diseases and studying pathogen evolution. The rapid identification of emerging viral strains, and/or genetic variants potentially associated with novel phenotypic features is one of the most important objectives of genomic surveillance of human pathogens and represents one of the first lines of defense for the control of their spread. During the COVID 19 pandemic, several taxonomic frameworks have been proposed for the classification of SARS-Cov-2 isolates. These systems, which are typically based on phylogenetic approaches, represent essential tools for epidemiological studies as well as contributing to the study of the origin of the outbreak. Here, we propose an alternative, reproducible, and transparent phenetic method to study changes in SARS-CoV-2 genomic diversity over time. We suggest that our approach can complement other systems and facilitate the identification of biologically relevant variants in the viral genome. To demonstrate the validity of our approach, we present comparative genomic analyses of more than 175,000 genomes. Our method delineates 22 distinct SARS-CoV-2 haplogroups, which, based on the distribution of high-frequency genetic variants, fall into four major macrohaplogroups. We highlight biased spatiotemporal distributions of SARS-CoV-2 genetic profiles and show that seven of the 22 haplogroups (and of all of the four haplogroup clusters) showed a broad geographic distribution within China by the time the outbreak was widely recognized-suggesting early emergence and widespread cryptic circulation of the virus well before its isolation in January 2020. General patterns of genomic variability are remarkably similar within all major SARS-CoV-2 haplogroups, with UTRs consistently exhibiting the greatest variability, with s2m, a conserved secondary structure element of unknown function in the 3'-UTR of the viral genome showing evidence of a functional shift. Although several polymorphic sites that are specific to one or more haplogroups were predicted to be under positive or negative selection, overall our analyses suggest that the emergence of novel types is unlikely to be driven by convergent evolution and independent fixation of advantageous substitutions, or by selection of recombined strains. In the absence of extensive clinical metadata for most available genome sequences, and in the context of extensive geographic and temporal biases in the sampling, many questions regarding the evolution and clinical characteristics of SARS-CoV-2 isolates remain open. However, our data indicate that the approach outlined here can be usefully employed in the identification of candidate SARS-CoV-2 genetic variants of clinical and epidemiological importance.

摘要

有效的分子数据分析系统是监测传染病传播和研究病原体进化的基础。快速鉴定新兴病毒株和/或与新型表型特征相关的遗传变异是人类病原体基因组监测的最重要目标之一,也是控制其传播的第一道防线。在 COVID-19 大流行期间,已经提出了几种用于分类 SARS-CoV-2 分离株的分类框架。这些系统通常基于系统发育方法,是流行病学研究的重要工具,并有助于研究疫情的起源。在这里,我们提出了一种替代的、可重复的、透明的表型方法来研究 SARS-CoV-2 基因组多样性随时间的变化。我们建议,我们的方法可以补充其他系统,并有助于鉴定病毒基因组中具有生物学意义的变异。为了证明我们方法的有效性,我们对超过 175000 个基因组进行了比较基因组分析。我们的方法划定了 22 个不同的 SARS-CoV-2 单倍群,根据高频遗传变异的分布,这些单倍群分为四个主要的宏单倍群。我们强调了 SARS-CoV-2 遗传特征的偏空间和时间分布,并表明在疫情广泛认识到的时候,22 个单倍群中的 7 个(以及所有的 4 个单倍群簇)在中国具有广泛的地理分布-表明病毒在 2020 年 1 月分离之前就已经早期出现并广泛隐匿传播。在所有主要的 SARS-CoV-2 单倍群中,基因组变异性的一般模式非常相似,UTR 始终表现出最大的变异性,而 s2m,病毒基因组 3'-UTR 中一个保守的二级结构元件,具有未知功能,显示出功能转变的证据。虽然预测到一些特定于一个或多个单倍群的多态性位点处于正选择或负选择之下,但总体而言,我们的分析表明,新型的出现不太可能是由趋同进化和有利替代的独立固定驱动的,也不是由重组菌株的选择驱动的。在大多数可用基因组序列缺乏广泛的临床元数据的情况下,以及在采样的广泛地理和时间偏差的背景下,关于 SARS-CoV-2 分离株的进化和临床特征的许多问题仍然存在。然而,我们的数据表明,这里概述的方法可以在鉴定具有临床和流行病学意义的 SARS-CoV-2 遗传变异方面得到有用的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b8a/8136509/b3e1581a683e/msab049f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索