Rokas Antonis, Carroll Sean B
Howard Hughes Medical Institute and Laboratory of Molecular Biology, University of Wisconsin-Madison, USA.
Mol Biol Evol. 2005 May;22(5):1337-44. doi: 10.1093/molbev/msi121. Epub 2005 Mar 2.
The relative contribution of taxon number and gene number to accuracy in phylogenetic inference is a major issue in phylogenetics and of central importance to the choice of experimental strategies for the successful reconstruction of a broad sketch of the tree of life. Maximization of the number of taxa sampled is the strategy favored by most phylogeneticists, although its necessity remains the subject of debate. Vast increases in gene number are now possible due to advances in genomics, but large numbers of genes will be available for only modest numbers of taxa, raising the question of whether such genome-scale phylogenies will be robust to the addition of taxa. To examine the relative benefit of increasing taxon number or gene number to phylogenetic accuracy, we have developed an assay that utilizes the symmetric difference tree distance as a measure of phylogenetic accuracy. We have applied this assay to a genome-scale data matrix containing 106 genes from 14 yeast species. Our results show that increasing taxon number correlates with a slight decrease in phylogenetic accuracy. In contrast, increasing gene number has a significant positive effect on phylogenetic accuracy. Analyses of an additional taxon-rich data matrix from the same yeast clade show that taxon number does not have a significant effect on phylogenetic accuracy. The positive effect of gene number and the lack of effect of taxon number on phylogenetic accuracy are also corroborated by analyses of two data matrices from mammals and angiosperm plants, respectively. We conclude that, for typical data sets, the number of genes utilized may be a more important determinant of phylogenetic accuracy than taxon number.
在系统发育推断中,分类单元数量和基因数量对准确性的相对贡献是系统发育学中的一个主要问题,对于成功构建生命之树大致轮廓的实验策略选择至关重要。增加抽样分类单元的数量是大多数系统发育学家青睐的策略,尽管其必要性仍存在争议。由于基因组学的进展,现在基因数量大幅增加成为可能,但大量基因仅适用于数量有限的分类单元,这就引发了这样一个问题:这种基因组规模的系统发育对于分类单元的增加是否稳健。为了研究增加分类单元数量或基因数量对系统发育准确性的相对益处,我们开发了一种分析方法,该方法利用对称差异树距离作为系统发育准确性的度量。我们将此分析方法应用于一个包含来自14种酵母物种的106个基因的基因组规模数据矩阵。我们的结果表明,增加分类单元数量与系统发育准确性的轻微下降相关。相比之下,增加基因数量对系统发育准确性有显著的积极影响。对来自同一酵母进化枝的另一个分类单元丰富的数据矩阵的分析表明,分类单元数量对系统发育准确性没有显著影响。分别对来自哺乳动物和被子植物的两个数据矩阵的分析也证实了基因数量对系统发育准确性的积极影响以及分类单元数量的无影响。我们得出结论,对于典型数据集,所使用的基因数量可能比分类单元数量更重要地决定系统发育准确性。