Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA; Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA.
Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Mol Phylogenet Evol. 2019 Oct;139:106539. doi: 10.1016/j.ympev.2019.106539. Epub 2019 Jun 18.
Genomic datasets sometimes support conflicting phylogenetic relationships when different tree-building methods are applied. Coherent interpretations of such results are enabled by partitioning support for controversial relationships among the constituent genes of a phylogenomic dataset. For the supermatrix (=concatenation) approach, several methods that measure the distribution of support and conflict among loci were introduced over 15 years ago. More recently, partitioned coalescence support (PCS) was developed for phylogenetic coalescence methods that account for incomplete lineage sorting and use the summed fits of gene trees to estimate the species tree. Here, we automate computation of PCS to permit application of this index to genome-scale matrices that include hundreds of loci. Reanalyses of four phylogenomic datasets for amniotes, land plants, skinks, and angiosperms demonstrate how PCS scores can be used to: (1) compare conflicting results favored by alternative coalescence methods, (2) identify outlier gene trees that have a disproportionate influence on the resolution of contentious relationships, (3) assess the effects of missing data in species-tree analysis, and (4) clarify biases in commonly-implemented coalescence methods and support indices. We show that key phylogenomic conclusions from these analyses often hinge on just a few gene trees and that results can be driven by specific biases of a particular coalescence method and/or the differential weight placed on gene trees with high versus low taxon sampling. The attribution of exceptionally high weight to some gene trees and very low weight to other gene trees counters the basic logic of phylogenomic coalescence analysis; even clades in species trees with high support according to commonly used indices (likelihood-ratio test, bootstrap, Bayesian local posterior probability) can be unstable to the removal of only one or two gene trees with high PCS. Computer simulations cannot adequately describe all of the contingencies and complexities of empirical genetic data. PCS scores complement simulation work by providing specific insights into a particular dataset given the assumptions of the phylogenetic coalescence method that is applied. In combination with standard measures of nodal support, PCS provides a more complete understanding of the overall genomic evidence for contested evolutionary relationships in species trees.
基因组数据集有时会支持不同的树构建方法之间存在冲突的系统发育关系。通过对系统发育基因组数据集的组成基因之间有争议的关系的支持进行分区,可以实现对这些结果的一致解释。对于超级矩阵(即串联)方法,15 年前就引入了几种测量支持和冲突在基因座之间分布的方法。最近,为了对不完全谱系分选进行解释,并使用基因树的总和拟合来估计物种树,开发了分区凝聚支持(PCS)。在这里,我们自动化了 PCS 的计算,以允许将此指数应用于包含数百个基因座的基因组规模矩阵。对有袋动物、陆地植物、石龙子和被子植物的四个系统发育基因组数据集的重新分析表明,PCS 得分可用于:(1)比较替代凝聚方法支持的冲突结果,(2)识别对有争议关系的分辨率有不成比例影响的异常基因树,(3)评估缺失数据对物种树分析的影响,以及(4)澄清常见实施的凝聚方法和支持指数中的偏差。我们表明,这些分析中的关键系统发育结论通常取决于少数几个基因树,并且结果可能是由特定的凝聚方法的特定偏差和/或对具有高与低分类群采样的基因树的不同权重引起的。对某些基因树赋予极高的权重,而对其他基因树赋予极低的权重,这与系统发育基因组凝聚分析的基本逻辑相悖;即使是根据常用指数(似然比检验、自举、贝叶斯局部后验概率)具有高支持的物种树中的分支也可能不稳定,只需删除一两个具有高 PCS 的基因树。计算机模拟不能充分描述经验遗传数据的所有偶然情况和复杂性。PCS 得分通过在应用的系统发育凝聚方法的假设下,为特定数据集提供特定的见解,补充了模拟工作。与节点支持的标准度量相结合,PCS 为物种树中争议性进化关系的整体基因组证据提供了更全面的理解。