Brown Jeremy M, Thomson Robert C
Department of Biological Sciences and Museum of Natural Science, Louisiana State University, 202 Life Science Building, Baton Rouge, LA 70803, USA.
Department of Biology, University of Hawaíi at Manoa, 2538 McCarthy Mall, Edmondson Hall Rm 216, Honolulu, HI 96822, USA.
Syst Biol. 2017 Jul 1;66(4):517-530. doi: 10.1093/sysbio/syw101.
As the application of genomic data in phylogenetics has become routine, a number of cases have arisen where alternative data sets strongly support conflicting conclusions. This sensitivity to analytical decisions has prevented firm resolution of some of the most recalcitrant nodes in the tree of life. To better understand the causes and nature of this sensitivity, we analyzed several phylogenomic data sets using an alternative measure of topological support (the Bayes factor) that both demonstrates and averts several limitations of more frequently employed support measures (such as Markov chain Monte Carlo estimates of posterior probabilities). Bayes factors reveal important, previously hidden, differences across six "phylogenomic" data sets collected to resolve the phylogenetic placement of turtles within Amniota. These data sets vary substantially in their support for well-established amniote relationships, particularly in the proportion of genes that contain extreme amounts of information as well as the proportion that strongly reject these uncontroversial relationships. All six data sets contain little information to resolve the phylogenetic placement of turtles relative to other amniotes. Bayes factors also reveal that a very small number of extremely influential genes (less than 1% of genes in a data set) can fundamentally change significant phylogenetic conclusions. In one example, these genes are shown to contain previously unrecognized paralogs. This study demonstrates both that the resolution of difficult phylogenomic problems remains sensitive to seemingly minor analysis details and that Bayes factors are a valuable tool for identifying and solving these challenges.
随着基因组数据在系统发育学中的应用已成为常规操作,出现了一些情况,即不同的数据集强烈支持相互矛盾的结论。这种对分析决策的敏感性阻碍了生命之树中一些最棘手节点的确定解决。为了更好地理解这种敏感性的原因和本质,我们使用了一种拓扑支持的替代度量(贝叶斯因子)来分析几个系统发育基因组数据集,该度量既展示了又避免了更常用支持度量的一些局限性(例如后验概率的马尔可夫链蒙特卡罗估计)。贝叶斯因子揭示了为解决龟在羊膜动物中的系统发育位置而收集的六个“系统发育基因组”数据集之间重要的、以前未被发现的差异。这些数据集在对既定的羊膜动物关系的支持上有很大差异,特别是在包含大量信息的基因比例以及强烈拒绝这些无争议关系的基因比例方面。所有六个数据集几乎没有信息来解决龟相对于其他羊膜动物的系统发育位置。贝叶斯因子还表明,极少数极具影响力的基因(占数据集中基因的不到1%)可以从根本上改变重要的系统发育结论。在一个例子中,这些基因被证明包含以前未被识别的旁系同源物。这项研究表明,困难的系统发育基因组问题的解决仍然对看似微小的分析细节敏感,并且贝叶斯因子是识别和解决这些挑战的宝贵工具。