深度合并问题的一致性属性及其在可扩展树搜索中的应用。

Consensus properties for the deep coalescence problem and their application for scalable tree search.

机构信息

Department of Computer Science, Iowa State University, Ames, IA, USA.

出版信息

BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S12. doi: 10.1186/1471-2105-13-S10-S12.

DOI:10.1186/1471-2105-13-S10-S12

PMID:22759417

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3382448/

Abstract

BACKGROUND

To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as incomplete lineage sorting, which creates incongruence between gene trees and the species tree. One approach to account for deep coalescence in phylogenetic analyses is the deep coalescence problem, which takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events. Although this approach is promising for phylogenetics, the consensus properties of this problem are mostly unknown and analyses of large data sets may be computationally prohibitive.

RESULTS

We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. Moreover, we introduce a new divide and conquer method for the deep coalescence problem based on the Pareto property. This method refines the strict consensus of the input gene trees, thereby, in practice, often greatly reducing the complexity of the tree search and guaranteeing that the estimated species tree will satisfy the Pareto property.

CONCLUSIONS

Analyses of both simulated and empirical data sets demonstrate that the divide and conquer method can greatly improve upon the speed of heuristics that do not consider the Pareto consensus property, while also guaranteeing that the proposed solution fulfills the Pareto property. The divide and conquer method extends the utility of the deep coalescence problem to data sets with enormous numbers of taxa.

摘要

背景

为了从非连锁基因推断物种系统发育，系统发育推断方法必须面对导致基因树与物种系统发育不一致的生物过程。祖先物种中的种内基因变异可能导致深合并，也称为不完全谱系分选，这会导致基因树与物种树之间的不一致。在系统发育分析中，一种解释深合并的方法是深合并问题，它将一组基因树作为输入，并寻找最少数目的深合并事件所暗示的物种树。尽管这种方法对系统发育学很有前景，但该问题的共识性质在很大程度上是未知的，并且对大型数据集的分析可能在计算上是不可行的。

结果

我们证明了深合并共识树问题满足聚类（进化枝）的高度理想的帕累托属性。也就是说，在所有情况下，存在于所有输入基因树中的每个聚类，称为共识聚类，也将在每个最优解中找到。此外，我们根据帕累托属性为深合并问题引入了一种新的分治方法。该方法细化了输入基因树的严格共识，从而在实践中，通常大大降低了树搜索的复杂性，并保证估计的物种树将满足帕累托属性。

结论

对模拟和真实数据集的分析表明，分治方法可以大大提高不考虑帕累托共识属性的启发式方法的速度，同时保证所提出的解决方案满足帕累托属性。分治方法将深合并问题的应用扩展到具有大量分类群的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e78/3382448/acefb9ff8032/1471-2105-13-S10-S12-1.jpg

相似文献

Consensus properties for the deep coalescence problem and their application for scalable tree search.

BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S12. doi: 10.1186/1471-2105-13-S10-S12.

Consensus properties and their large-scale applications for the gene duplication problem.

J Bioinform Comput Biol. 2016 Jun;14(3):1642005. doi: 10.1142/S0219720016420051. Epub 2016 Mar 6.

Efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence.

BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S11. doi: 10.1186/1471-2105-13-S10-S11.

Algorithms for genome-scale phylogenetics using gene tree parsimony.

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):939-56. doi: 10.1109/TCBB.2013.103.

Synthesizing large-scale species trees using the strict consensus approach.

J Bioinform Comput Biol. 2017 Jun;15(3):1740002. doi: 10.1142/S0219720017400029. Epub 2017 Apr 20.

Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models.

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S42. doi: 10.1186/1471-2105-11-S1-S42.

Mathematical properties of the deep coalescence cost.

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jan-Feb;10(1):61-72. doi: 10.1109/TCBB.2012.133.

iGTP: a software package for large-scale gene tree parsimony analysis.

BMC Bioinformatics. 2010 Nov 23;11:574. doi: 10.1186/1471-2105-11-574.

Triplet supertree heuristics for the tree of life.

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2105-10-S1-S8.

The gene tree delusion.

Mol Phylogenet Evol. 2016 Jan;94(Pt A):1-33. doi: 10.1016/j.ympev.2015.07.018. Epub 2015 Jul 31.

引用本文的文献

Exact median-tree inference for unrooted reconciliation costs.

BMC Evol Biol. 2020 Oct 28;20(Suppl 1):136. doi: 10.1186/s12862-020-01700-w.

Are the duplication cost and Robinson-Foulds distance equivalent?

J Comput Biol. 2014 Aug;21(8):578-90. doi: 10.1089/cmb.2014.0021. Epub 2014 Jul 2.

本文引用的文献

From gene trees to species trees II: species tree inference by minimizing deep coalescence events.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1685-91. doi: 10.1109/TCBB.2011.83.

Consistency properties of species tree inference by minimizing deep coalescences.

J Comput Biol. 2011 Jan;18(1):1-15. doi: 10.1089/cmb.2010.0102.

Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes.

Syst Biol. 2009 Oct;58(5):463-7. doi: 10.1093/sysbio/syp061. Epub 2009 Sep 17.

Efficient genome-scale phylogenetic analysis under the duplication-loss and deep coalescence cost models.

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S42. doi: 10.1186/1471-2105-11-S1-S42.

Bayesian inference of species trees from multilocus data.

Mol Biol Evol. 2010 Mar;27(3):570-80. doi: 10.1093/molbev/msp274. Epub 2009 Nov 11.

Species tree inference by minimizing deep coalescences.

PLoS Comput Biol. 2009 Sep;5(9):e1000501. doi: 10.1371/journal.pcbi.1000501. Epub 2009 Sep 11.

STEM: species tree estimation using maximum likelihood for gene trees under coalescence.

Bioinformatics. 2009 Apr 1;25(7):971-3. doi: 10.1093/bioinformatics/btp079. Epub 2009 Feb 10.

Is a new and general theory of molecular systematics emerging?

Evolution. 2009 Jan;63(1):1-19. doi: 10.1111/j.1558-5646.2008.00549.x.

BEST: Bayesian estimation of species trees under the coalescent model.

Bioinformatics. 2008 Nov 1;24(21):2542-3. doi: 10.1093/bioinformatics/btn484. Epub 2008 Sep 17.

Properties of supertree methods in the consensus setting.

Syst Biol. 2007 Apr;56(2):330-7. doi: 10.1080/10635150701245370.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深度合并问题的一致性属性及其在可扩展树搜索中的应用。

Consensus properties for the deep coalescence problem and their application for scalable tree search.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献