Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.
Syst Biol. 2009 Feb;58(1):100-13. doi: 10.1093/sysbio/syp013. Epub 2009 May 25.
Several methods have recently been developed that allow the reconstruction of species trees from gene trees, an important achievement in our ongoing quest to obtain reliable species phylogenies. However, considerably less attention has been given to evaluating the accuracy of species trees' estimates. Four methods for measuring branch support of species trees are tested in this study in a gene tree parsimony framework: 1) bootstrap lineages (BL) (sequences) within species, 2) bootstrap characters (BC) within genes (i.e., the standard nonparametric bootstrap), 3) bootstrap lineages and characters (BLC), and 4) posterior probability gene tree sampling (PPGTS) (where, for each resampled data set, gene trees are sampled according to their posterior probability). For each method, n species trees are reconstructed from n resampled data sets and the branch support consists in the percentage of the n species trees in which a branch is recovered. The 4 methods were tested for several species trees and for different sampling efforts (i.e., number of genes and individuals sampled) using coalescent simulations. PPGTS performed best overall with lowest Type I and II error rates, followed by BLC. The BL and BC methods had higher error rates. This suggests that in order to properly measure branch support in a species tree context, it is important to account for the uncertainty involved in reconstructing gene trees from DNA sequences as well as that involved in reconstructing the species tree from individual gene trees. With the parameters used in the simulations, sampling more individuals per species resulted in similar improvements in support values as when sampling more genes. Moreover, sampling more individuals per species appeared to be important for escaping the anomaly zone present when only 1 sequence was sampled. We also apply the 4 methods to obtain branch supports for the species phylogeny of diploid wild roses (Rosa) in North America.
最近开发了几种方法,可以从基因树重建物种树,这是我们获得可靠物种系统发育的持续探索中的一项重要成就。然而,对于评估物种树估计的准确性,人们的关注要少得多。本研究在基因树简约框架中测试了四种测量物种树分支支持的方法:1)种内的引导线系(BL)(序列),2)基因内的引导字符(BC)(即标准非参数引导),3)引导线系和字符(BLC),4)后验概率基因树采样(PPGTS)(对于每个重采样数据集,根据其后验概率对基因树进行采样)。对于每种方法,从 n 个重采样数据集重建 n 个物种树,分支支持由恢复分支的 n 个物种树的百分比组成。对于几种物种树和不同的采样工作(即,采样的基因数量和个体数量),使用合并模拟测试了 4 种方法。PPGTS 的总体性能最佳,具有最低的 I 型和 II 型错误率,其次是 BLC。BL 和 BC 方法的错误率较高。这表明,为了在物种树上下文中正确测量分支支持,重要的是要考虑从 DNA 序列重建基因树以及从个体基因树重建物种树所涉及的不确定性。在模拟中使用的参数下,对每个物种采样更多的个体可以获得与对更多基因采样相似的支持值提高。此外,对每个物种采样更多的个体对于避免仅采样 1 个序列时出现的异常区非常重要。我们还应用这 4 种方法来获得北美二倍体野玫瑰(Rosa)物种系统发育的分支支持。