Reeves Patrick A, Richards Christopher M
United States Department of Agriculture, Agricultural Research Service, National Center for Genetic Resources Preservation, 1111 South Mason Street, Fort Collins, Colorado 80521, USA.
Syst Biol. 2007 Apr;56(2):302-20. doi: 10.1080/10635150701324225.
Hybridization is a well-documented, natural phenomenon that is common at low taxonomic levels in the higher plants and other groups. In spite of the obvious potential for gene flow via hybridization to cause reticulation in an evolutionary tree, analytical methods based on a strictly bifurcating model of evolution have frequently been applied to data sets containing taxa known to hybridize in nature. Using simulated data, we evaluated the relative performance of phenetic, tree-based, and network approaches for distinguishing between taxa with known reticulate history and taxa that were true terminal monophyletic groups. In all methods examined, type I error (the erroneous rejection of the null hypothesis that a taxon of interest is not monophyletic) was likely during the early stages of introgressive hybridization. We used the gradual erosion of type I error with continued gene flow as a metric for assessing relative performance. Bifurcating tree-based methods performed poorly, with highly supported, incorrect topologies appearing during some phases of the simulation. Based on our model, we estimate that many thousands of gene flow events may be required in natural systems before reticulate taxa will be reliably detected using tree-based methods of phylogeny reconstruction. We conclude that the use of standard bifurcating tree-based methods to identify terminal monophyletic groups for the purposes of defining or delimiting phylogenetic species, or for prioritizing populations for conservation purposes, is difficult to justify when gene flow between sampled taxa is possible. As an alternative, we explored the use of two network methods. Minimum spanning networks performed worse than most tree-based methods and did not yield topologies that were easily interpretable as phylogenies. The performance of NeighborNet was comparable to parsimony bootstrap analysis. NeighborNet and reverse successive weighting were capable of identifying an ephemeral signature of reticulate evolution during the early stages of introgression by revealing conflicting phylogenetic signal. However, when gene flow was topologically complex, the conflicting phylogenetic signal revealed by these methods resulted in a high probability of type II error (inferring that a monophyletic taxon has a reticulate history). Lastly, we present a novel application of an existing nonparametric clustering procedure that, when used against a density landscape derived from principal coordinate data, showed superior performance to the tree-based and network procedures tested.
杂交是一种有充分文献记载的自然现象,在高等植物和其他类群的低分类水平上很常见。尽管通过杂交实现基因流动明显有可能在进化树中导致网状进化,但基于严格二叉分支进化模型的分析方法却经常被应用于包含已知在自然中杂交的分类单元的数据集。我们使用模拟数据,评估了表型法、基于树的方法和网络方法在区分具有已知网状进化历史的分类单元和真正的末端单系类群方面的相对性能。在所研究的所有方法中,在渐渗杂交的早期阶段,I型错误(错误地拒绝感兴趣的分类单元不是单系类群的零假设)很可能出现。我们将随着基因流动持续I型错误的逐渐减少作为评估相对性能的一个指标。基于二叉分支树的方法表现不佳,在模拟的某些阶段出现了支持度很高但错误的拓扑结构。根据我们的模型,我们估计在自然系统中可能需要数千次基因流动事件,之后使用基于树的系统发育重建方法才能可靠地检测到网状分类单元。我们得出结论,当采样的分类单元之间可能存在基因流动时,使用基于标准二叉分支树的方法来识别末端单系类群以定义或界定系统发育物种,或为保护目的对种群进行优先级排序,很难说得通。作为一种替代方法,我们探索了两种网络方法的应用。最小生成网络的表现比大多数基于树的方法更差,并且没有产生易于解释为系统发育的拓扑结构。邻接网络的性能与简约自展分析相当。邻接网络和反向连续加权能够通过揭示相互冲突的系统发育信号,在渐渗的早期阶段识别出网状进化的短暂特征。然而,当基因流动的拓扑结构复杂时,这些方法揭示的相互冲突的系统发育信号导致II型错误(推断一个单系分类单元具有网状进化历史)的概率很高。最后,我们展示了一种现有非参数聚类程序的新应用,当针对从主坐标数据导出的密度景观使用时,它表现出优于所测试的基于树和网络的程序的性能。