Sinaimeri Blerina, Urbini Laura, Sagot Marie-France, Matias Catherine
Libera Università Internazionale degli Studi Sociali Guido Carli, Rome, Department of Business and Management, Viale Romania, 32 - 00197, Rome, Italy.
ERABLE team, Inria - Institut national de recherche en informatique et en automatique, Lyon, 56 Bd Niels Bohr, 69100 Villeurbanne, France.
Syst Biol. 2023 Dec 30;72(6):1370-1386. doi: 10.1093/sysbio/syad058.
Phylogenetic tree reconciliation is extensively employed for the examination of coevolution between host and symbiont species. An important concern is the requirement for dependable cost values when selecting event-based parsimonious reconciliation. Although certain approaches deduce event probabilities unique to each pair of host and symbiont trees, which can subsequently be converted into cost values, a significant limitation lies in their inability to model the invasion of diverse host species by the same symbiont species (termed as a spread event), which is believed to occur in symbiotic relationships. Invasions lead to the observation of multiple associations between symbionts and their hosts (indicating that a symbiont is no longer exclusive to a single host), which are incompatible with the existing methods of coevolution. Here, we present a method called AmoCoala (an enhanced version of the tool Coala) that provides a more realistic estimation of cophylogeny event probabilities for a given pair of host and symbiont trees, even in the presence of spread events. We expand the classical 4-event coevolutionary model to include 2 additional outcomes, vertical and horizontal spreads, that lead to multiple associations. In the initial step, we estimate the probabilities of spread events using heuristic frequencies. Subsequently, in the second step, we employ an approximate Bayesian computation approach to infer the probabilities of the remaining 4 classical events (cospeciation, duplication, host switch, and loss) based on these values. By incorporating spread events, our reconciliation model enables a more accurate consideration of multiple associations. This improvement enhances the precision of estimated cost sets, paving the way to a more reliable reconciliation of host and symbiont trees. To validate our method, we conducted experiments on synthetic datasets and demonstrated its efficacy using real-world examples. Our results showcase that AmoCoala produces biologically plausible reconciliation scenarios, further emphasizing its effectiveness.
系统发育树匹配被广泛用于研究宿主与共生体物种之间的协同进化。一个重要问题是在选择基于事件的简约匹配时需要可靠的代价值。尽管某些方法可以推导出每对宿主树和共生体树特有的事件概率,随后可将其转换为代价值,但一个重大局限在于它们无法对同一共生体物种入侵不同宿主物种(称为传播事件)进行建模,而这种情况被认为发生在共生关系中。传播事件会导致观察到共生体与其宿主之间存在多种关联(这表明一个共生体不再专属于单个宿主),这与现有的协同进化方法不兼容。在此,我们提出一种名为AmoCoala(工具Coala的增强版本)的方法,即使存在传播事件,它也能为给定的一对宿主树和共生体树提供更现实的共系统发育事件概率估计。我们将经典的四事件协同进化模型扩展为包括另外两个结果,即垂直传播和水平传播,这会导致多种关联。在第一步中,我们使用启发式频率估计传播事件的概率。随后,在第二步中,我们采用近似贝叶斯计算方法,根据这些值推断其余四个经典事件(共物种形成、重复、宿主转换和丢失)的概率。通过纳入传播事件,我们的匹配模型能够更准确地考虑多种关联。这种改进提高了估计代价集的精度,为宿主树和共生体树更可靠的匹配铺平了道路。为了验证我们的方法,我们在合成数据集上进行了实验,并使用实际例子证明了其有效性。我们的结果表明,AmoCoala产生了生物学上合理的匹配场景,进一步强调了其有效性。