Department of Entomology, Cornell University, Comstock Hall, Ithaca, NY 14853, USA.
Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.
Syst Biol. 2021 Jun 16;70(4):803-821. doi: 10.1093/sysbio/syaa097.
Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree, and RAxML. We study their performance in the phylogenomic framework of $>$800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (ModelFinder program). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group. [ASTRAL; Bees; concordance; gene tree estimation error; IQ-Tree; MrBayes, Nomiinae; PhyloBayes; RAxML; phylogenomics; stemminess].
使用两步合并方法将个体基因树总结为物种系统发育树,现在是系统基因组学领域的标准策略。然而,总结方法的实际实施受到基因树估计误差的影响,这种误差是由各种生物和分析因素引起的。对于经验数据集,基因树推断方法的选择及其对物种树估计的下游影响,研究得还很不够。为了更好地理解这种方法选择对基因和物种树准确性的影响,我们比较了在不同模型选择标准下,四个广泛使用的程序估计的基因树:PhyloBayes、MrBayes、IQ-Tree 和 RAxML。我们在蜜蜂亚科 Nominae(Halictidae)的 800 多个超保守元件的系统基因组学框架中研究了它们的性能。我们的分类群采样集中在 Pseudapis 属上,这是一个具有不同形态特征的独特谱系,但基于形态的分类分类存在争议,也没有分子系统发育指导。我们通过评估它们恢复两个无争议的单系群的能力来近似基因树的拓扑准确性,并使用茎性度量(内部分支的相对长度)比较个体树的分支长度。我们进一步研究了去除无信息基因座和将弱支持节点折叠为并系的不同策略。然后,我们使用 ASTRAL 对基因树进行总结,并比较得出的物种系统发育,包括与基于串联的估计的比较。MrBayes 中使用可逆跳跃模型搜索获得的基因树平均最一致,所有贝叶斯方法都产生了具有更好茎性值的基因树。然而,唯一一种 ASTRAL 汇总树始终产生最可能正确拓扑的基因树估计方法是 IQ-Tree 与自动模型指定(ModelFinder 程序)。我们讨论了这些发现,并就汇总方法的基因树估计提供了实用建议。最后,我们为 Pseudapis s. l. 建立了第一个基于系统发育的分类,并映射了该组独特形态特征的分布。[ASTRAL;蜜蜂;一致性;基因树估计误差;IQ-Tree;MrBayes,Nomiinae;PhyloBayes;RAxML;系统基因组学;茎性]。