Shen Xing-Xing, Hittinger Chris Todd, Rokas Antonis
Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235, USA.
Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
Nat Ecol Evol. 2017 Apr 10;1(5):126. doi: 10.1038/s41559-017-0126.
Phylogenomic studies have resolved countless branches of the tree of life, but remain strongly contradictory on certain, contentious relationships. Here, we use a maximum likelihood framework to quantify the distribution of phylogenetic signal among genes and sites for 17 contentious branches and 6 well-established control branches in plant, animal and fungal phylogenomic data matrices. We find that resolution in some of these 17 branches rests on a single gene or a few sites, and that removal of a single gene in concatenation analyses or a single site from every gene in coalescence-based analyses diminishes support and can alter the inferred topology. These results suggest that tiny subsets of very large data matrices drive the resolution of specific internodes, providing a dissection of the distribution of support and observed incongruence in phylogenomic analyses. We submit that quantifying the distribution of phylogenetic signal in phylogenomic data is essential for evaluating whether branches, especially contentious ones, are truly resolved. Finally, we offer one detailed example of such an evaluation for the controversy regarding the earliest-branching metazoan phylum, for which examination of the distributions of gene-wise and site-wise phylogenetic signal across eight data matrices consistently supports ctenophores as the sister group to all other metazoans.
系统发育基因组学研究已经解析了生命之树的无数分支,但在某些有争议的关系上仍然存在强烈的矛盾。在这里,我们使用最大似然框架来量化植物、动物和真菌系统发育基因组数据矩阵中17个有争议分支和6个已确立的对照分支的基因和位点间系统发育信号的分布。我们发现,这17个分支中的一些分支的分辨率取决于单个基因或少数几个位点,并且在串联分析中去除单个基因或在基于合并的分析中从每个基因中去除单个位点会削弱支持度,并可能改变推断的拓扑结构。这些结果表明,非常大的数据矩阵中的微小子集驱动了特定节点的分辨率,揭示了系统发育基因组分析中支持度的分布和观察到的不一致性。我们认为,量化系统发育基因组数据中系统发育信号的分布对于评估分支,尤其是有争议的分支是否真正得到解析至关重要。最后,我们针对最早分支的后生动物门的争议提供了一个详细的此类评估示例,对于该争议,对八个数据矩阵中基因层面和位点层面系统发育信号分布的检查一致支持栉水母动物作为所有其他后生动物的姐妹群。