Suppr超能文献

使用混合模型进行种系发生树推断。

Species Tree Inference Using a Mixture Model.

机构信息

School of Computer Science and Communication, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden.

Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.

出版信息

Mol Biol Evol. 2015 Sep;32(9):2469-82. doi: 10.1093/molbev/msv115. Epub 2015 May 11.

Abstract

Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714-5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1-44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323-330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540-1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/.

摘要

物种树重建由于其在生物学和医学中的核心作用而成为一个重要的研究课题。物种树通常使用一组基因树或直接使用序列数据进行重建。在这两种情况下,主要的混杂现象之一是由于进化事件(如复制和丢失)导致物种树与基因树之间的不匹配。概率方法可以通过共同估计基因树和物种树来解决这种不匹配,但这种方法对于更大的数据集存在可扩展性问题。我们提出了 MixTreEM-DLRS:一种用于在存在基因复制和丢失的情况下重建物种树的两阶段方法。在第一阶段,MixTreEM,一种基于混合模型的新结构期望最大化算法,用于根据研究基因组中单拷贝基因家族的序列数据重建一组候选物种树。在第二阶段,PrIME-DLRS,一种基于 DLRS 模型的方法(Åkerborg O、Sennblad B、Arvestad L、Lagergren J. 2009. 同时进行贝叶斯基因树重建和协调分析。Proc Natl Acad Sci U S A. 106(14):5714-5719),用于选择最佳物种树。PrIME-DLRS 可以处理多拷贝基因家族,因为除了对序列进化进行建模外,DLRS 还使用基因进化模型(Arvestad L、Lagergren J、Sennblad B. 2009. 基因进化模型及其相关概率的计算。J ACM. 56(2):1-44)对基因复制和丢失进行建模。我们使用合成数据和生物数据评估了 MixTreEM-DLRS,并将其性能与最近的基因组规模物种树重建方法 PHYLDOG(Boussau B、Szöllősi GJ、Duret L、Gouy M、Tannier E、Daubin V. 2013. 同时进行物种和基因树的协同估计。基因组研究。23(2):323-330)以及快速基于简约的算法 Duptree(Wehe A、Bansal MS、Burleigh JG、Eulenstein O. 2008. Duptree:一种使用基因树简约进行大规模系统发育分析的程序。生物信息学 24(13):1540-1541)进行比较。我们的方法在准确性方面与 PHYLDOG 相当,并且运行速度明显更快,并且我们的方法在准确性方面优于 Duptree。没有 DLRS 的 MixTreEM 分析也可用于选择目标物种树,从而为更大的数据集提供快速而准确的算法。MixTreEM 可在 http://prime.scilifelab.se/mixtreem/ 免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验