Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA.
Syst Biol. 2010 Oct;59(5):573-83. doi: 10.1093/sysbio/syq047. Epub 2010 Sep 10.
Discord in the estimated gene trees among loci can be attributed to both the process of mutation and incomplete lineage sorting. Effectively modeling these two sources of variation--mutational and coalescent variance--provides two distinct challenges for phylogenetic studies. Despite extensive investigation on mutational models for gene-tree estimation over the past two decades and recent attention to modeling of the coalescent process for phylogenetic estimation, the effects of these two variances have yet to be evaluated simultaneously. Here, we partition the effects of mutational and coalescent processes on phylogenetic accuracy by comparing the accuracy of species trees estimated from gene trees (i.e., the actual coalescent genealogies) with that of species trees estimated from estimated gene trees (i.e., trees estimated from nucleotide sequences, which contain both coalescent and mutational variance). Not only is there a significant contribution of both mutational and coalescent variance to errors in species-tree estimates, but the relative magnitude of the effects on the accuracy of species-tree estimation also differs systematically depending on 1) the timing of divergence, 2) the sampling design, and 3) the method used for species-tree estimation. These findings explain why using more information contained in gene trees (e.g., topology and branch lengths as opposed to just topology) does not necessarily translate into pronounced gains in accuracy, highlighting the strengths and limits of different methods for species-tree estimation. Differences in accuracy scores between methods for different sampling regimes also emphasize that it would be a mistake to assume more computationally intensive species-tree estimation procedures that will always provide better estimates of species trees. To the contrary, the performance of a method depends not only on the method per se but also on the compatibilities between the input genetic data and the method as determined by the relative impact of mutational and coalescent variance.
在各基因座的估计基因树中出现的分歧可能归因于突变过程和不完全谱系分选。有效地对这两个变异来源——突变和合并方差进行建模,给系统发育研究带来了两个截然不同的挑战。尽管过去二十年来对基因树估计的突变模型进行了广泛的研究,并且最近对系统发育估计的合并过程建模给予了关注,但这两种方差的影响尚未得到同时评估。在这里,我们通过比较从基因树(即实际的合并谱系)估计的物种树的准确性与从估计的基因树(即包含合并和突变方差的核苷酸序列估计的树)估计的物种树的准确性,来划分突变和合并过程对系统发育准确性的影响。不仅突变和合并方差对物种树估计的误差有显著贡献,而且这些影响对物种树估计准确性的相对大小也会根据 1)分歧的时间、2)采样设计和 3)物种树估计方法而系统地不同。这些发现解释了为什么使用基因树中包含的更多信息(例如拓扑结构和分支长度,而不仅仅是拓扑结构)不一定会显著提高准确性,强调了不同物种树估计方法的优缺点。不同采样方案下不同方法之间的准确性得分差异也强调了一个错误,即假设更复杂的物种树估计程序总是会提供更好的物种树估计值。相反,方法的性能不仅取决于方法本身,还取决于输入遗传数据与方法之间的兼容性,这取决于突变和合并方差的相对影响。