Blom Mozes P K, Bragg Jason G, Potter Sally, Moritz Craig
Research School of Biology, Australian National University, Canberra ACT 0200, Australia.
Syst Biol. 2017 May 1;66(3):352-366. doi: 10.1093/sysbio/syw089.
Accurate gene tree inference is an important aspect of species tree estimation in a summary-coalescent framework. Yet, in empirical studies, inferred gene trees differ in accuracy due to stochastic variation in phylogenetic signal between targeted loci. Empiricists should, therefore, examine the consistency of species tree inference, while accounting for the observed heterogeneity in gene tree resolution of phylogenomic data sets. Here, we assess the impact of gene tree estimation error on summary-coalescent species tree inference by screening ${\sim}2000$ exonic loci based on gene tree resolution prior to phylogenetic inference. We focus on a phylogenetically challenging radiation of Australian lizards (genus Cryptoblepharus, Scincidae) and explore effects on topology and support. We identify a well-supported topology based on all loci and find that a relatively small number of high-resolution gene trees can be sufficient to converge on the same topology. Adding gene trees with decreasing resolution produced a generally consistent topology, and increased confidence for specific bipartitions that were poorly supported when using a small number of informative loci. This corroborates coalescent-based simulation studies that have highlighted the need for a large number of loci to confidently resolve challenging relationships and refutes the notion that low-resolution gene trees introduce phylogenetic noise. Further, our study also highlights the value of quantifying changes in nodal support across locus subsets of increasing size (but decreasing gene tree resolution). Such detailed analyses can reveal anomalous fluctuations in support at some nodes, suggesting the possibility of model violation. By characterizing the heterogeneity in phylogenetic signal among loci, we can account for uncertainty in gene tree estimation and assess its effect on the consistency of the species tree estimate. We suggest that the evaluation of gene tree resolution should be incorporated in the analysis of empirical phylogenomic data sets. This will ultimately increase our confidence in species tree estimation using summary-coalescent methods and enable us to exploit genomic data for phylogenetic inference. [Coalescence; concatenation; Cryptoblepharus; exon capture; gene tree; phylogenomics; species tree.].
在总结合并框架中,准确推断基因树是物种树估计的一个重要方面。然而,在实证研究中,由于目标基因座之间系统发育信号的随机变化,推断出的基因树在准确性上存在差异。因此,实证研究者在考虑系统发育组数据集基因树分辨率中观察到的异质性时,应该检验物种树推断的一致性。在这里,我们通过在系统发育推断之前基于基因树分辨率筛选约2000个外显子基因座,评估基因树估计误差对总结合并物种树推断的影响。我们聚焦于澳大利亚蜥蜴(睑虎属,石龙子科)一个在系统发育上具有挑战性的辐射分支,并探讨其对拓扑结构和支持度的影响。我们基于所有基因座确定了一个得到充分支持的拓扑结构,并发现相对少量的高分辨率基因树就足以汇聚到相同的拓扑结构上。添加分辨率逐渐降低的基因树产生了总体一致的拓扑结构,并增加了对特定二分法的置信度,而这些二分法在使用少量信息丰富的基因座时支持度较低。这证实了基于合并的模拟研究,这些研究强调需要大量基因座才能可靠地解决具有挑战性的关系,并反驳了低分辨率基因树会引入系统发育噪声的观点。此外,我们的研究还强调了量化节点支持度在大小不断增加(但基因树分辨率不断降低)的基因座子集中变化的价值。这种详细分析可以揭示某些节点支持度的异常波动,表明可能存在模型违背的情况。通过表征基因座之间系统发育信号的异质性,我们可以考虑基因树估计中的不确定性,并评估其对物种树估计一致性的影响。我们建议在实证系统发育组数据集的分析中纳入对基因树分辨率的评估。这最终将增加我们使用总结合并方法进行物种树估计的信心,并使我们能够利用基因组数据进行系统发育推断。[合并;串联;睑虎属;外显子捕获;基因树;系统发育组学;物种树。]