Simmons Mark P, Miya Masaki
Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Mol Phylogenet Evol. 2004 Apr;31(1):351-62. doi: 10.1016/j.ympev.2003.08.004.
Many phylogenetic analyses that include numerous terminals but few genes show high resolution and branch support for relatively recently diverged clades, but lack of resolution and/or support for "basal" clades of the tree. The various benefits of increased taxon and character sampling have been widely discussed in the literature, albeit primarily based on simulations rather than empirical data. In this study, we used a well-sampled gene-tree analysis (based on 100 mitochondrial genomes of higher teleost fishes) to test empirically the efficiency of different methods of data sampling and phylogenetic inference to "correctly" resolve the basal clades of a tree (based on congruence with the reference tree constructed using all 100 taxa and 7990 characters). By itself, increased character sampling was an inefficient method by which to decrease the likelihood of "incorrect" resolution (i.e., incongruence with the reference tree) for parsimony analyses. Although increased taxon sampling was a powerful approach to alleviate "incorrect" resolution for parsimony analyses, it had the general effect of increasing the number of, and support for, "incorrectly" resolved clades in the Bayesian analyses. For both the parsimony and Bayesian analyses, increased taxon sampling, by itself, was insufficient to help resolve the basal clades, making this sampling strategy ineffective for that purpose. For this empirical study, the most efficient of the six approaches considered to resolve the basal clades when adding nucleotides to a dataset that consists of a single gene sampled for a small, but representative, number of taxa, is to increase character sampling and analyze the characters using the Bayesian method.
许多系统发育分析纳入了众多终端分类单元但基因数量较少,这些分析对相对近期分化的分支显示出高分辨率和分支支持,但对树的“基部”分支缺乏分辨率和/或支持。增加分类单元和特征抽样的各种好处在文献中已有广泛讨论,尽管主要是基于模拟而非实证数据。在本研究中,我们使用了一个抽样良好的基因树分析(基于100个硬骨鱼高级类群的线粒体基因组),以实证检验不同数据抽样方法和系统发育推断方法对于“正确”解析树的基部分支的效率(基于与使用所有100个分类单元和7990个特征构建的参考树的一致性)。就简约分析而言,仅增加特征抽样是一种低效的方法,无法降低“错误”解析(即与参考树不一致)的可能性。虽然增加分类单元抽样是减轻简约分析中“错误”解析的有力方法,但它总体上增加了贝叶斯分析中“错误”解析分支的数量并增强了对这些分支的支持。对于简约分析和贝叶斯分析,仅增加分类单元抽样不足以帮助解析基部分支,使得这种抽样策略在该目的上无效。对于本实证研究,在向由少量但具代表性的分类单元抽样的单个基因组成的数据集中添加核苷酸时,所考虑的六种解析基部分支的方法中,最有效的方法是增加特征抽样并使用贝叶斯方法分析这些特征。