Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY 10003.
School of Natural Sciences, Massey University, Auckland 0745, New Zealand.
Proc Natl Acad Sci U S A. 2023 Jan 31;120(5):e2206945119. doi: 10.1073/pnas.2206945119. Epub 2023 Jan 24.
Quantifying SARS-like coronavirus (SL-CoV) evolution is critical to understanding the origins of SARS-CoV-2 and the molecular processes that could underlie future epidemic viruses. While genomic analyses suggest recombination was a factor in the emergence of SARS-CoV-2, few studies have quantified recombination rates among SL-CoVs. Here, we infer recombination rates of SL-CoVs from correlated substitutions in sequencing data using a coalescent model with recombination. Our computationally-efficient, non-phylogenetic method infers recombination parameters of both sampled sequences and the unsampled gene pools with which they recombine. We apply this approach to infer recombination parameters for a range of positive-sense RNA viruses. We then analyze a set of 191 SL-CoV sequences (including SARS-CoV-2) and find that ORF1ab and S genes frequently undergo recombination. We identify which SL-CoV sequence clusters have recombined with shared gene pools, and show that these pools have distinct structures and high recombination rates, with multiple recombination events occurring per synonymous substitution. We find that individual genes have recombined with different viral reservoirs. By decoupling contributions from mutation and recombination, we recover the phylogeny of non-recombined portions for many of these SL-CoVs, including the position of SARS-CoV-2 in this clonal phylogeny. Lastly, by analyzing >400,000 SARS-CoV-2 whole genome sequences, we show current diversity levels are insufficient to infer the within-population recombination rate of the virus since the pandemic began. Our work offers new methods for inferring recombination rates in RNA viruses with implications for understanding recombination in SARS-CoV-2 evolution and the structure of clonal relationships and gene pools shaping its origins.
量化类似 SARS 冠状病毒(SL-CoV)的进化对于理解 SARS-CoV-2 的起源以及可能导致未来流行病毒的分子过程至关重要。虽然基因组分析表明重组是 SARS-CoV-2 出现的一个因素,但很少有研究定量分析 SL-CoV 中的重组率。在这里,我们使用带有重组的合并模型从测序数据中的相关替换推断 SL-CoV 的重组率。我们的计算效率高、非系统发育的方法推断了抽样序列和它们重组的未抽样基因库的重组参数。我们将这种方法应用于推断一系列正链 RNA 病毒的重组参数。然后,我们分析了一组 191 个 SL-CoV 序列(包括 SARS-CoV-2),发现 ORF1ab 和 S 基因经常发生重组。我们确定了哪些 SL-CoV 序列聚类与共享基因库发生了重组,并表明这些基因库具有不同的结构和高重组率,每个同义替换就会发生多次重组事件。我们发现,不同的基因与不同的病毒库发生了重组。通过分离突变和重组的贡献,我们恢复了许多这些 SL-CoV 的非重组部分的系统发育,包括 SARS-CoV-2 在这个克隆系统发育中的位置。最后,通过分析超过 40 万个 SARS-CoV-2 全基因组序列,我们表明目前的多样性水平不足以推断自大流行开始以来病毒的种群内重组率。我们的工作为推断 RNA 病毒中的重组率提供了新的方法,这对于理解 SARS-CoV-2 进化中的重组、克隆关系的结构以及塑造其起源的基因库具有重要意义。