Suppr超能文献

使用复合似然法从序列数据推断系统发育网络

Inference of Phylogenetic Networks From Sequence Data Using Composite Likelihood.

作者信息

Kong Sungsik, Swofford David L, Kubatko Laura S

机构信息

Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA.

Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, USA.

出版信息

Syst Biol. 2025 Feb 10;74(1):53-69. doi: 10.1093/sysbio/syae054.

Abstract

While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between 2 species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing 2 branches to merge into 1, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than 2 existing composite likelihood summary methods (SNaQand PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.

摘要

虽然系统发育树对于理解物种如何进化至关重要,但它们并不能充分描述某些进化过程。例如,杂交是一种常见现象,即两个物种之间的杂交会导致新物种的形成,这必须用系统发育网络来描述,系统发育网络是一种通过允许两个分支合并为一个分支来修改系统发育树的结构,从而产生网状结构。然而,随着数据集大小和/或拓扑复杂性的增加,现有的估计网络的方法在计算上变得昂贵。尽管有越来越多的证据表明杂交在自然界中频繁发生,但缺乏可扩展推理方法阻碍了系统发育网络在实践中的广泛应用。在这里,我们提出了一种新方法PhyNEST(使用位点模式的系统发育网络估计),它可以直接从序列数据中估计具有固定的、用户指定数量的网状结构的二元一级系统发育网络。通过使用复合似然作为推理基础,PhyNEST能够以计算上易于处理的方式使用完整的基因组数据,从而无需在网络估计之前将数据总结为一组基因树。为了搜索网络空间,PhyNEST实现了爬山算法和模拟退火算法。PhyNEST假设数据由根据Jukes-Cantor替换模型进化的合并独立位点组成,并且网络具有恒定的有效种群大小。模拟研究表明,PhyNEST通常比现有的两种复合似然总结方法(SNaQ和PhyloNet)更准确,并且它对至少一种形式的模型错误指定具有鲁棒性(假设核苷酸替换模型比真实生成模型更简单)。我们应用PhyNEST分别重建了以杂交物种形成和广泛渗入为特征的红蛱蝶和猕猴科灵长类动物之间的进化关系。PhyNEST在一个开源的Julia包中实现,可在https://github.com/sungsik-kong/PhyNEST.jl上公开获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验