Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
School of Life Sciences, Fudan University, Shanghai 200433, China.
Mol Phylogenet Evol. 2021 Apr;157:107017. doi: 10.1016/j.ympev.2020.107017. Epub 2020 Nov 24.
The COVID-19 pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whose origin is still shed in mystery. In this study, we developed a method to search the basal SARS-CoV-2 clade among collected SARS-CoV-2 genome sequences. We first identified the mutation sites in the SARS-CoV-2 whole genome sequence alignment. Then by the pairwise comparison of the numbers of mutation sites among all SARS-CoV-2s, the least mutated clade was identified, which is the basal clade under parsimony principle. In our first analysis, we used 168 SARS-CoV-2 sequences (GISAID dataset till 2020/03/04) to identify the basal clade which contains 33 identical viral sequences from seven countries. To our surprise, in our second analysis with 367 SARS-CoV-2 sequences (GISAID dataset till 2020/03/17), the basal clade has 51 viral sequences, 18 more sequences added. The much larger NCBI dataset shows that this clade has expanded with 85 unique sequences by 2020/04/04. The expanding basal clade tells a chilling fact that the least mutated SARS-CoV-2 sequence was replicating and spreading for at least four months. It is known that coronaviruses have the RNA proofreading capability to ensure their genome replication fidelity. Interestingly, we found that the SARS-CoV-2 without its nonstructural proteins 13 to 16 (Nsp13-Nsp16) exhibits an unusually high mutation rate. Our result suggests that SARS-CoV-2 has an unprecedented RNA proofreading capability which can intactly preserve its genome even after a long period of transmission. Our selection analyses also indicate that the positive selection event enabling SARS-CoV-2 to cross species and adapt to human hosts might have been achieved before its outbreak.
新型冠状病毒病(COVID-19)是由严重急性呼吸系统综合症冠状病毒 2(SARS-CoV-2)引起的,其起源仍扑朔迷离。本研究旨在建立一种方法,从已收集的 SARS-CoV-2 基因组序列中寻找 SARS-CoV-2 的基础分支。我们首先确定 SARS-CoV-2 全基因组序列比对中的突变位点。然后,通过对所有 SARS-CoV-2 中突变位点数量的成对比较,根据简约性原则,确定突变最少的分支,即基础分支。在首次分析中,我们使用了 168 个 SARS-CoV-2 序列(截止至 2020 年 3 月 4 日的 GISAID 数据集)来识别基础分支,该分支包含来自 7 个国家的 33 个相同病毒序列。令我们惊讶的是,在第二次分析中,使用了 367 个 SARS-CoV-2 序列(截止至 2020 年 3 月 17 日的 GISAID 数据集),基础分支增加了 18 个序列,包含 51 个病毒序列。更大的 NCBI 数据集显示,该分支在 2020 年 4 月 4 日已经扩展到 85 个独特的序列。不断扩展的基础分支表明,复制和传播最少突变的 SARS-CoV-2 序列至少已经持续了四个月。众所周知,冠状病毒具有 RNA 校对功能,以确保其基因组复制的保真度。有趣的是,我们发现没有非结构蛋白 13-16(Nsp13-Nsp16)的 SARS-CoV-2 表现出异常高的突变率。我们的结果表明,SARS-CoV-2 具有前所未有的 RNA 校对能力,即使在长时间的传播后,其基因组也能完整保存。我们的选择分析还表明,使 SARS-CoV-2 能够跨越物种并适应人类宿主的正选择事件可能在其爆发之前就已经发生。