Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, USA.
The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA.
Genome Biol. 2020 Dec 23;21(1):304. doi: 10.1186/s13059-020-02191-0.
A mechanistic understanding of the spread of SARS-CoV-2 and diligent tracking of ongoing mutagenesis are of key importance to plan robust strategies for confining its transmission. Large numbers of available sequences and their dates of transmission provide an unprecedented opportunity to analyze evolutionary adaptation in novel ways. Addition of high-resolution structural information can reveal the functional basis of these processes at the molecular level. Integrated systems biology-directed analyses of these data layers afford valuable insights to build a global understanding of the COVID-19 pandemic.
Here we identify globally distributed haplotypes from 15,789 SARS-CoV-2 genomes and model their success based on their duration, dispersal, and frequency in the host population. Our models identify mutations that are likely compensatory adaptive changes that allowed for rapid expansion of the virus. Functional predictions from structural analyses indicate that, contrary to previous reports, the AspGly mutation in the spike glycoprotein (S) likely reduced transmission and the subsequent ProLeu mutation in the RNA-dependent RNA polymerase led to the precipitous spread of the virus. Our model also suggests that two mutations in the nsp13 helicase allowed for the adaptation of the virus to the Pacific Northwest of the USA. Finally, our explainable artificial intelligence algorithm identified a mutational hotspot in the sequence of S that also displays a signature of positive selection and may have implications for tissue or cell-specific expression of the virus.
These results provide valuable insights for the development of drugs and surveillance strategies to combat the current and future pandemics.
深入了解 SARS-CoV-2 的传播机制并对正在发生的突变进行持续追踪,对于制定强有力的策略来限制其传播至关重要。大量现有的序列及其传播日期为我们提供了一个前所未有的机会,以新的方式分析进化适应。增加高分辨率的结构信息可以揭示这些过程在分子水平上的功能基础。对这些数据层进行综合的系统生物学导向分析,可以为构建对 COVID-19 大流行的全球理解提供有价值的见解。
在这里,我们从 15789 个 SARS-CoV-2 基因组中识别出了全球分布的单倍型,并根据它们在宿主群体中的持续时间、传播范围和频率来模拟它们的成功。我们的模型确定了那些可能是补偿性适应性变化的突变,这些突变允许病毒快速扩张。结构分析的功能预测表明,与之前的报告相反,刺突糖蛋白(S)中的 AspGly 突变可能降低了病毒的传播能力,随后的 RNA 依赖性 RNA 聚合酶中的 ProLeu 突变导致病毒的迅速传播。我们的模型还表明,nsp13 解旋酶中的两个突变允许病毒适应美国太平洋西北地区。最后,我们的可解释人工智能算法在 S 序列中识别出了一个突变热点,该热点也显示出正选择的特征,这可能对病毒在组织或细胞中的特异性表达产生影响。
这些结果为开发药物和监测策略提供了有价值的见解,以应对当前和未来的大流行。