Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, D.C. 20052, USA; College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China; and National Systematics Laboratory NMFS/NOAA, Post Office Box 37012, Smithsonian Institution NHB, WC 60, MRC-153, Washington, D.C. 20013-7012, USA.
Syst Biol. 2013 Sep;62(5):763-85. doi: 10.1093/sysbio/syt039. Epub 2013 Jun 8.
Non-homogeneous processes and, in particular, base compositional non-stationarity have long been recognized as a critical source of systematic error. But only a small fraction of current molecular systematic studies methodically examine and effectively account for the potentially confounding effect of non-stationarity. The problem is especially overlooked in multi-locus or phylogenomic scale analyses, in part because no efficient tools exist to accommodate base composition heterogeneity in large data sets. We present a detailed analysis of a data set with 20 genes and 214 taxa to study the phylogeny of flatfishes (Pleuronectiformes) and their position among percomorphs. Most genes vary significantly in base composition among taxa and fail to resolve flatfish monophyly and other emblematic groups, suggesting that non-stationarity may be causing systematic error. We show a strong association between base compositional bias and topological discordance among individual gene partitions and their inferred trees. Phylogenetic methods applying non-homogeneous models to accommodate non-stationarity have relatively minor effect to reduce gene tree discordance, suggesting that available computer programs applying these methods do not scale up efficiently to the data set of modest size analysed in this study. By comparing phylogenetic trees obtained with species tree (STAR) and concatenation approaches, we show that gene tree discordance in our data set is most likely due to base compositional biases than to incomplete lineage sorting. Multi-locus analyses suggest that the combined phylogenetic signal from all loci in a concatenated data set overcomes systematic biases induced by non-stationarity at each partition. Finally, relationships among flatfishes and their relatives are discussed in the light of these results. We find support for the monophyly of flatfishes and confirm findings from previous molecular phylogenetic studies suggesting their close affinity with several carangimorph groups (i.e., jack and allies, barracuda, archerfish, billfish and swordfish, threadfin, moonfish, beach salmon, and snook and barramundi).
非同质过程,特别是碱基组成非平稳性,长期以来一直被认为是系统误差的一个关键来源。但当前只有一小部分分子系统研究有系统地检查并有效地解释非平稳性的潜在混杂效应。这个问题在多基因座或系统发生基因组尺度分析中尤其被忽视,部分原因是没有有效的工具来适应大数据集中的碱基组成异质性。我们对一个包含 20 个基因和 214 个分类单元的数据集进行了详细分析,以研究比目鱼(鲽形目)的系统发育及其在鲈形目鱼类中的位置。大多数基因在分类单元之间的碱基组成上差异显著,无法解决比目鱼的单系性和其他标志性类群,这表明非平稳性可能导致系统误差。我们发现碱基组成偏差与个体基因分区及其推断的树之间的拓扑分歧之间存在强烈的关联。应用非同质模型来适应非平稳性的系统发生方法相对较小地影响减少基因树分歧,这表明可用的应用这些方法的计算机程序没有有效地扩展到本研究分析的适度大小的数据集。通过比较使用物种树(STAR)和串联方法获得的系统发育树,我们表明在我们的数据集基因树分歧最有可能是由于碱基组成偏差而不是不完全谱系分选引起的。多基因座分析表明,串联数据集中所有基因座的联合系统发育信号克服了每个分区中由非平稳性引起的系统偏差。最后,根据这些结果讨论了比目鱼及其近亲之间的关系。我们支持比目鱼的单系性,并证实了以前的分子系统发育研究的发现,表明它们与几个鲹形目鱼类群体(即,鲳鱼和其近亲、梭鱼、射水鱼、旗鱼和剑鱼、细鳞鱼、月鱼、滩三文鱼和海鲢和金目鲈)密切相关。