Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA.
Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA.
Mol Biol Evol. 2024 Jun 1;41(6). doi: 10.1093/molbev/msae078.
Reassortment is an evolutionary process common in viruses with segmented genomes. These viruses can swap whole genomic segments during cellular co-infection, giving rise to novel progeny formed from the mixture of parental segments. Since large-scale genome rearrangements have the potential to generate new phenotypes, reassortment is important to both evolutionary biology and public health research. However, statistical inference of the pattern of reassortment events from phylogenetic data is exceptionally difficult, potentially involving inference of general graphs in which individual segment trees are embedded. In this paper, we argue that, in general, the number and pattern of reassortment events are not identifiable from segment trees alone, even with theoretically ideal data. We call this fact the fundamental problem of reassortment, which we illustrate using the concept of the "first-infection tree," a potentially counterfactual genealogy that would have been observed in the segment trees had no reassortment occurred. Further, we illustrate four additional problems that can arise logically in the inference of reassortment events and show, using simulated data, that these problems are not rare and can potentially distort our observation of reassortment even in small data sets. Finally, we discuss how existing methods can be augmented or adapted to account for not only the fundamental problem of reassortment, but also the four additional situations that can complicate the inference of reassortment.
重配是具有分段基因组的病毒中常见的进化过程。这些病毒在细胞共感染过程中可以交换整个基因组片段,从而产生由亲本片段混合形成的新型后代。由于大规模的基因组重排有可能产生新的表型,因此重配对于进化生物学和公共卫生研究都很重要。然而,从系统发育数据推断重配事件的模式非常困难,这可能涉及到推断一般的图,其中个体片段树被嵌入。在本文中,我们认为,一般来说,即使在理论上理想的数据条件下,仅从片段树上也无法识别重配事件的数量和模式。我们将这一事实称为重配的基本问题,并使用“首次感染树”的概念来说明这一问题,这是一种潜在的反事实的系统发育,即在没有发生重配的情况下,本应在片段树上观察到的系统发育。此外,我们还说明了在推断重配事件时可能会出现的另外四个逻辑问题,并使用模拟数据表明,这些问题并不罕见,即使在小数据集上,也可能会扭曲我们对重配的观察。最后,我们讨论了如何扩展或调整现有方法,不仅可以解决重配的基本问题,还可以解决可能使重配推断复杂化的另外四个情况。