Department of Biology, University of Florida, Gainesville, FL 32609, USA.
Syst Biol. 2011 Mar;60(2):117-25. doi: 10.1093/sysbio/syq072. Epub 2010 Dec 24.
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.
使用全基因组数据集进行系统发育分析时必须面对基因树之间的不一致性,而在植物中,这种不一致性因频繁的基因重复和缺失而加剧。基因树简约(GTP)是一种系统发育优化标准,其中选择最小化一组基因树中诱导的基因重复数量的种系发生树。以前实现的运行时性能限制了其在大规模数据集上的使用。我们使用了新的软件,该软件结合了最近的算法进步,研究了 GTP 在一个由 18896 个基因树组成的植物数据集上的性能,这些基因树包含来自 136 个植物分类群的 510922 个蛋白质序列(总对齐长度超过 290 万个字符)。从 GTP 分析推断出的关系在很大程度上与以前关于植物系统发育主干的大规模研究一致,并解决了一些有争议的节点。在 GTP bootstrap 复制中,存在于少数基因树中的分类群的位置变化最大。在 GTP 分析之前或之后排除这些分类群,植物的系统发育支持率都很高。分析结果支持木兰类与真双子叶植物+单子叶植物类群姐妹群的关系,不支持 eurosid I 和 II 类群。本研究从核基因组的角度展示了植物之间广泛的系统发育关系,并证明了具有重复和缺失历史的核基因可以为解决植物的生命之树提供系统发育信息。