Bonetti Franceschi Vinicius, Volz Erik
Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, England, W2 1PG, UK.
Wellcome Open Res. 2024 Jul 24;9:85. doi: 10.12688/wellcomeopenres.20704.2. eCollection 2024.
Large-scale sequencing of SARS-CoV-2 has enabled the study of viral evolution during the COVID-19 pandemic. Some viral mutations may be advantageous to viral replication within hosts but detrimental to transmission, thus carrying a transient fitness advantage. By affecting the number of descendants, persistence times and growth rates of associated clades, these mutations generate localised imbalance in phylogenies. Quantifying these features in closely-related clades with and without recurring mutations can elucidate the tradeoffs between within-host replication and between-host transmission.
We implemented a novel phylogenetic clustering algorithm ( mlscluster, https://github.com/mrc-ide/mlscluster) to systematically explore time-scaled phylogenies for mutations under transient/multilevel selection. We applied this method to a SARS-CoV-2 time-calibrated phylogeny with >1.2 million sequences from England, and characterised these recurrent mutations that may influence transmission fitness across PANGO-lineages and genomic regions using Poisson regressions and summary statistics.
We found no major differences across two epidemic stages (before and after Omicron), PANGO-lineages, and genomic regions. However, spike, nucleocapsid, and ORF3a were proportionally more enriched for transmission fitness polymorphisms (TFP)-homoplasies than other proteins. We provide a catalog of SARS-CoV-2 sites under multilevel selection, which can guide experimental investigations within and beyond the spike protein.
This study provides empirical evidence for the existence of important tradeoffs between within-host replication and between-host transmission shaping the fitness landscape of SARS-CoV-2. This method may be used as a fast and scalable means to shortlist large sequence databases for sites under putative multilevel selection which may warrant subsequent confirmatory analyses and experimental confirmation.
对严重急性呼吸综合征冠状病毒2(SARS-CoV-2)进行大规模测序有助于研究新冠疫情期间的病毒进化。一些病毒突变可能有利于病毒在宿主体内复制,但不利于传播,因此具有短暂的适应性优势。这些突变通过影响相关进化枝的后代数量、持续时间和增长率,在系统发育中产生局部失衡。量化有无反复突变的密切相关进化枝中的这些特征,可以阐明宿主内复制与宿主间传播之间的权衡。
我们实施了一种新型系统发育聚类算法(mlscluster,https://github.com/mrc-ide/mlscluster),以系统地探索在瞬时/多级选择下突变的时间尺度系统发育。我们将此方法应用于一个来自英国的、包含超过120万个序列的SARS-CoV-2时间校准系统发育树,并使用泊松回归和汇总统计来表征这些可能影响跨PANGO谱系和基因组区域传播适应性的反复突变。
我们发现在两个流行阶段(奥密克戎毒株出现之前和之后)、PANGO谱系和基因组区域之间没有重大差异。然而,与其他蛋白质相比,刺突蛋白、核衣壳蛋白和ORF3a在传播适应性多态性(TFP)-同塑性方面的富集比例更高。我们提供了一个在多级选择下的SARS-CoV-2位点目录,可指导对刺突蛋白内外的实验研究。
本研究为宿主内复制与宿主间传播之间存在重要权衡塑造SARS-CoV-2适应性景观提供了实证证据。该方法可用作一种快速且可扩展的手段,从大型序列数据库中筛选出可能需要后续验证分析和实验确认的假定多级选择位点。