China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Genomics Proteomics Bioinformatics. 2020 Dec;18(6):648-663. doi: 10.1016/j.gpb.2020.10.003. Epub 2021 Feb 11.
COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months, and a global fight against both has been intensifying. Here, we describe an analysis procedure where genome composition and its variables are related, through the genetic code to molecular mechanisms, based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex. Our analysis starts with primary sequence information, identity-based phylogeny based on 22,051 SARS-CoV-2 sequences, and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades. All are tailored to two key mechanisms: strand-biased and function-associated mutations. Our findings are listed as follows: 1) The most dominant mutation is C-to-U permutation, whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity, albeit assumed most slightly deleterious. 2) The second abundance group includes three negative-strand mutations (U-to-C, A-to-G, and G-to-A) and a positive-strand mutation (G-to-U) due to DNA repair mechanisms after cellular abasic events. 3) A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis. 4) Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes. These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes, to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications. Such actions are in desperate need, especially in the middle of the War against COVID-19.
新冠病毒(COVID-19)及其病原体 SARS-CoV-2 在短短几个月内将全球推向了惊人的大流行,全球范围内对这两者的斗争正在加剧。在这里,我们描述了一种分析程序,通过遗传密码将基因组组成及其变量与分子机制联系起来,基于对 RNA 复制及其从突变到病毒蛋白质组序列兄弟的反馈环的理解,包括复制酶-转录酶复合物上的有效位点。我们的分析从一级序列信息开始,基于 22051 个 SARS-CoV-2 序列的基于身份的系统发育,以及对序列变异模式的评估,包括突变谱及其在有组织的进化枝中的 12 种排列。所有这些都是针对两种关键机制:链偏向和功能相关的突变。我们的发现如下:1)最主要的突变是 C 到 U 的排列,其丰富的第二位碱基计数改变了氨基酸组成,使其朝着更高的分子量和更低的疏水性方向发展,尽管假设其具有轻微的致害性。2)第二丰富的群体包括三个负链突变(U 到 C、A 到 G 和 G 到 A)和一个正链突变(G 到 U),这是由于细胞碱缺失事件后的 DNA 修复机制。3)发现与进化枝相关的偏向突变趋势归因于负链合成水平的升高。4)在进化枝内的排列变异对于关联非同义突变和病毒蛋白质组变化非常有信息量。这些发现需要一个平台,将新兴突变映射到大多数微妙但快速调整的病毒蛋白质组和转录组上,为逻辑收敛后的有效药物和诊断应用提供生物学和临床信息。这些行动是非常必要的,尤其是在抗击 COVID-19 的中期。