新的统计标准可检测由组成异质性引起的系统发育偏差。

New Statistical Criteria Detect Phylogenetic Bias Caused by Compositional Heterogeneity.

机构信息

School of Life and Environmental Sciences, University of Sydney, Sydney, NSW, Australia.

Centre for Systems Genomics, University of Melbourne, Melbourne, VIC, Australia.

出版信息

Mol Biol Evol. 2017 Jun 1;34(6):1529-1534. doi: 10.1093/molbev/msx092.

DOI:10.1093/molbev/msx092

PMID:28333201

Abstract

In statistical phylogenetic analyses of DNA sequences, models of evolutionary change commonly assume that base composition is stationary through time and across lineages. This assumption is violated by many data sets, but it is unclear whether the magnitude of these violations is sufficient to mislead phylogenetic inference. We investigated the impacts of compositional heterogeneity on phylogenetic estimates using a method for assessing model adequacy. Based on a detailed simulation study, we found that common frequentist criteria are highly conservative, such that the model is often rejected when the phylogenetic estimates do not show clear signs of bias. We propose new criteria and provide guidelines for their usage. We apply these criteria to genome-scale data from 40 birds and find that loci with severely non-homogeneous base composition are uncommon. Our results show the importance of using well-informed diagnostic statistics when testing model adequacy for phylogenomic analyses.

摘要

在 DNA 序列的统计系统发育分析中，进化变化模型通常假定碱基组成在时间上和谱系间是稳定的。这一假设被许多数据集所违背，但这些违反的程度是否足以误导系统发育推断尚不清楚。我们使用一种评估模型适当性的方法来研究组成异质性对系统发育估计的影响。基于详细的模拟研究，我们发现常用的频率主义标准非常保守，以至于当系统发育估计没有明显的偏差迹象时，该模型通常会被拒绝。我们提出了新的标准，并提供了使用这些标准的指南。我们将这些标准应用于来自 40 种鸟类的基因组规模数据，发现具有严重非均匀碱基组成的基因座并不常见。我们的结果表明，在对基因组分析进行模型适当性检验时，使用信息充分的诊断统计数据非常重要。