Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
Syst Biol. 2009 Oct;58(5):468-77. doi: 10.1093/sysbio/syp031. Epub 2009 Jul 16.
The estimation of species trees (phylogenies) is one of the most important problems in evolutionary biology, and recently, there has been greater appreciation of the need to estimate species trees directly rather than using gene trees as a surrogate. A Bayesian method constructed under the multispecies coalescent model can consistently estimate species trees but involves intensive computation, which can hinder its application to the phylogenetic analysis of large-scale genomic data. Many summary statistics-based approaches, such as shallowest coalescences (SC) and Global LAteSt Split (GLASS), have been developed to infer species phylogenies for multilocus data sets. In this paper, we propose 2 methods, species tree estimation using average ranks of coalescences (STAR) and species tree estimation using average coalescence times (STEAC), based on the summary statistics of coalescence times. It can be shown that the 2 methods are statistically consistent under the multispecies coalescent model. STAR uses the ranks of coalescences and is thus resistant to variable substitution rates along the branches in gene trees. A simulation study suggests that STAR consistently outperforms STEAC, SC, and GLASS when the substitution rates among lineages are highly variable. Two real genomic data sets were analyzed by the 2 methods and produced species trees that are consistent with previous results.
物种树(系统发育)的估计是进化生物学中最重要的问题之一,最近人们越来越意识到需要直接估计物种树,而不是将基因树作为替代。基于多物种合并模型构建的贝叶斯方法可以一致地估计物种树,但涉及密集的计算,这可能会阻碍其在大规模基因组数据分析中的应用。已经开发了许多基于汇总统计信息的方法,例如最浅合并(SC)和全局最晚分裂(GLASS),用于推断多基因座数据集的物种系统发育。在本文中,我们提出了两种基于合并时间汇总统计信息的方法,即使用平均合并等级的物种树估计(STAR)和使用平均合并时间的物种树估计(STEAC)。可以证明,这两种方法在多物种合并模型下是统计一致的。STAR 使用合并的等级,因此不受基因树中分支上可变替代率的影响。模拟研究表明,当谱系间的替代率高度可变时,STAR 始终优于 STEAC、SC 和 GLASS。通过这两种方法分析了两个真实的基因组数据集,并生成了与先前结果一致的物种树。