Gatesy John, Baker Richard H
Department of Biology, University of California-Riverside, Riverside, CA 92521, USA.
Syst Biol. 2005 Jun;54(3):483-92. doi: 10.1080/10635150590945368.
Combined analysis of multiple phylogenetic data sets can reveal emergent character support that is not evident in separate analyses of individual data sets. Previous parsimony analyses have shown that this hidden support often accounts for a large percentage of the overall phylogenetic signal in cladistic studies. Here, reanalysis of a large comparative genomic data set for yeast (genus Saccharomyces) demonstrates that hidden support can be an important factor in maximum likelihood analyses of multiple data sets as well. Emergent signal in a concatenation of 106 genes was responsible for up to 64% of the likelihood support at a particular node (the difference in log likelihood scores between optimal topologies that included and excluded a supported clade). A grouping of four yeast species (S. cerevisiae, S. paradoxus, S. mikatae, and S. kudriavzevii) was robustly supported by combined analysis of all 106 genes, but separate analyses of individual genes suggested numerous conflicts. Forty-eight genes strictly contradicted S. cerevisiae + S. paradoxus + S. mikatae + S. kudriavzevii in separate analyses, but combined likelihood analyses that included up to 45 of the "wrong" data sets supported this group. Extensive hidden support also emerged in a combined likelihood analysis of 41 genes that each recovered the exact same topology in separate analyses of the individual genes. These results show that isolated analyses of individual data sets can mask congruence and distort interpretations of clade stability, even in strictly model-based phylogenetic methods. Consensus and supertree procedures that ignore hidden phylogenetic signals are, at best, incomplete.
对多个系统发育数据集进行联合分析能够揭示出在对单个数据集进行单独分析时不明显的新出现的特征支持。先前的简约分析表明,这种隐藏的支持在分支系统学研究中通常占整体系统发育信号的很大比例。在这里,对酵母(酿酒酵母属)的一个大型比较基因组数据集的重新分析表明,隐藏的支持在多个数据集的最大似然分析中也可能是一个重要因素。在由106个基因串联而成的数据集中,新出现的信号在一个特定节点上对似然支持的贡献高达64%(包含和排除一个得到支持的分支的最优拓扑结构之间的对数似然得分差异)。通过对所有106个基因进行联合分析,有力地支持了四个酵母物种(酿酒酵母、奇异酵母、米卡塔酵母和库德里亚夫采夫酵母)的一个分组,但对单个基因的单独分析显示出许多冲突。在单独分析中,有48个基因与酿酒酵母 + 奇异酵母 + 米卡塔酵母 + 库德里亚夫采夫酵母的分组严格矛盾,但包含多达45个“错误”数据集的联合似然分析却支持了这个分组。在对41个基因进行联合似然分析时也出现了广泛的隐藏支持,这些基因在对单个基因的单独分析中各自都得到了完全相同的拓扑结构。这些结果表明,即使在严格基于模型的系统发育方法中,对单个数据集进行孤立分析也可能掩盖一致性并扭曲对分支稳定性的解释。忽略隐藏系统发育信号的共识和超树程序充其量是不完整的。