Rochman Nash D, Wolf Yuri I, Faure Guilhem, Mutz Pascal, Zhang Feng, Koonin Eugene V
National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894.
Broad Institute of MIT and Harvard, Cambridge, MA 02142.
bioRxiv. 2021 Mar 2:2020.10.12.336644. doi: 10.1101/2020.10.12.336644.
Understanding the trends in SARS-CoV-2 evolution is paramount to control the COVID-19 pandemic. We analyzed more than 300,000 high quality genome sequences of SARS-CoV-2 variants available as of January 2021. The results show that the ongoing evolution of SARS-CoV-2 during the pandemic is characterized primarily by purifying selection, but a small set of sites appear to evolve under positive selection. The receptor-binding domain of the spike protein and the nuclear localization signal (NLS) associated region of the nucleocapsid protein are enriched with positively selected amino acid replacements. These replacements form a strongly connected network of apparent epistatic interactions and are signatures of major partitions in the SARS-CoV-2 phylogeny. Virus diversity within each geographic region has been steadily growing for the entirety of the pandemic, but analysis of the phylogenetic distances between pairs of regions reveals four distinct periods based on global partitioning of the tree and the emergence of key mutations. The initial period of rapid diversification into region-specific phylogenies that ended in February 2020 was followed by a major extinction event and global homogenization concomitant with the spread of D614G in the spike protein, ending in March 2020. The NLS associated variants across multiple partitions rose to global prominence in March-July, during a period of stasis in terms of inter-regional diversity. Finally, beginning July 2020, multiple mutations, some of which have since been demonstrated to enable antibody evasion, began to emerge associated with ongoing regional diversification, which might be indicative of speciation.
了解严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的进化趋势对于控制2019冠状病毒病(COVID-19)大流行至关重要。我们分析了截至2021年1月可获得的30多万个高质量的SARS-CoV-2变体基因组序列。结果表明,大流行期间SARS-CoV-2的持续进化主要以纯化选择为特征,但一小部分位点似乎在正选择下进化。刺突蛋白的受体结合域和核衣壳蛋白的核定位信号(NLS)相关区域富含正选择的氨基酸替换。这些替换形成了一个明显的上位相互作用的强连接网络,是SARS-CoV-2系统发育中主要分支的特征。在整个大流行期间,每个地理区域内的病毒多样性一直在稳步增长,但对各区域对之间系统发育距离的分析基于树的全球划分和关键突变的出现揭示了四个不同时期。2020年2月结束的快速多样化为区域特异性系统发育的初始阶段之后是一次重大灭绝事件和全球同质化,同时刺突蛋白中的D614G传播,于2020年3月结束。在区域间多样性处于停滞期的3月至7月期间,多个分区中与NLS相关的变体在全球范围内变得突出。最后,从2020年7月开始,多个突变开始出现,其中一些后来被证明能够逃避抗体,这些突变与持续的区域多样化相关,这可能表明正在形成新物种。