Suppr超能文献

在复制-缺失和深度合并成本模型下进行高效的基因组规模系统发育分析。

Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models.

机构信息

School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S42. doi: 10.1186/1471-2105-11-S1-S42.

Abstract

BACKGROUND

Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets.

RESULTS

We describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets.

CONCLUSION

Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses.

摘要

背景

基因组数据为系统发育分析提供了丰富的新信息。然而,要利用这些数据,需要使用能够有效分析极其大型数据集的系统发育方法,并能够解释基因进化过程,例如基因复制和丢失、不完全谱系分选(深合并)或水平基因转移,这些过程会导致基因树之间的不一致。一种这样的方法是基因树简约法,给定一组基因树,它会寻找一个物种树,该树需要最少的进化事件来解释基因树的不一致性。然而,对于复制-丢失或深合并重定代价下的基因树简约法,唯一现有的算法对于大型数据集来说非常缓慢。

结果

我们描述了基于 SPR 和 TBR 的新的局部搜索启发式算法,用于复制-丢失代价,并且展示了如何将它们适用于深合并代价。与这些问题的现有最佳算法相比,这些算法的速度提高了 n 倍,其中 n 是基因树集合中的物种数量。我们实现了我们新的基于 SPR 的局部搜索算法,用于复制-丢失代价,并展示了它在运行时间和可扩展性方面提供的巨大改进,与现有实现相比。我们还在三个大型基因组数据集上评估了我们算法的性能。

结论

我们的新算法首次能够使用复制-丢失和深合并重定代价对来自数百个分类群的数千个基因进行基因树简约法分析。因此,这项工作扩展了数据集的大小和可以纳入基因组规模系统发育分析的进化模型的范围。

相似文献

2
Algorithms for genome-scale phylogenetics using gene tree parsimony.基于基因树简约法的基因组尺度系统发育算法。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):939-56. doi: 10.1109/TCBB.2013.103.
7
Multiple Optimal Reconciliations Under the Duplication-Loss-Coalescence Model.复制-缺失-融合模型下的多重最优协调。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2144-2156. doi: 10.1109/TCBB.2019.2922337. Epub 2021 Dec 8.

引用本文的文献

1
Statistical inconsistency of the unrooted minimize deep coalescence criterion.无根最小深度融合准则的统计不一致性。
PLoS One. 2021 May 10;16(5):e0251107. doi: 10.1371/journal.pone.0251107. eCollection 2021.
2
Exact median-tree inference for unrooted reconciliation costs.无根配准代价的精确中位数树推断。
BMC Evol Biol. 2020 Oct 28;20(Suppl 1):136. doi: 10.1186/s12862-020-01700-w.
4
Phylogenetic tree building in the genomic age.基因组时代的系统发育树构建。
Nat Rev Genet. 2020 Jul;21(7):428-444. doi: 10.1038/s41576-020-0233-0. Epub 2020 May 18.
5
Disentangling genetic structure for genetic monitoring of complex populations.解析复杂群体遗传监测的遗传结构。
Evol Appl. 2018 Mar 23;11(7):1149-1161. doi: 10.1111/eva.12622. eCollection 2018 Aug.
6
Clustering Genes of Common Evolutionary History.具有共同进化历史的基因聚类
Mol Biol Evol. 2016 Jun;33(6):1590-605. doi: 10.1093/molbev/msw038. Epub 2016 Feb 17.
7
The inference of gene trees with species trees.基于物种树推断基因树。
Syst Biol. 2015 Jan;64(1):e42-62. doi: 10.1093/sysbio/syu048. Epub 2014 Jul 28.
9
Genome-scale coestimation of species and gene trees.基因组规模的种系和基因树共估计。
Genome Res. 2013 Feb;23(2):323-30. doi: 10.1101/gr.141978.112. Epub 2012 Nov 6.

本文引用的文献

2
Species tree inference by minimizing deep coalescences.通过最小化深度合并来推断物种树。
PLoS Comput Biol. 2009 Sep;5(9):e1000501. doi: 10.1371/journal.pcbi.1000501. Epub 2009 Sep 11.
5
Simultaneous Bayesian gene tree reconstruction and reconciliation analysis.同时进行贝叶斯基因树重建与和解分析。
Proc Natl Acad Sci U S A. 2009 Apr 7;106(14):5714-9. doi: 10.1073/pnas.0806251106. Epub 2009 Mar 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验