Daniel F. and Ada L. Rice Plant Conservation Science Center, Chicago Botanic Garden, Glencoe, IL.
Department of Biological Sciences, University of Arkansas, Fayetteville, AR.
Mol Biol Evol. 2018 Jan 1;35(1):80-93. doi: 10.1093/molbev/msx268.
Diatoms (Bacillariophyta) are a species-rich group of eukaryotic microbes diverse in morphology, ecology, and metabolism. Previous reconstructions of the diatom phylogeny based on one or a few genes have resulted in inconsistent resolution or low support for critical nodes. We applied phylogenetic paralog pruning techniques to a data set of 94 diatom genomes and transcriptomes to infer perennially difficult species relationships, using concatenation and summary-coalescent methods to reconstruct species trees from data sets spanning a wide range of thresholds for taxon and column occupancy in gene alignments. Conflicts between gene and species trees decreased with both increasing taxon occupancy and bootstrap cutoffs applied to gene trees. Concordance between gene and species trees was lowest for short internodes and increased logarithmically with increasing edge length, suggesting that incomplete lineage sorting disproportionately affects species tree inference at short internodes, which are a common feature of the diatom phylogeny. Although species tree topologies were largely consistent across many data treatments, concatenation methods appeared to outperform summary-coalescent methods for sparse alignments. Our results underscore that approaches to species-tree inference based on few loci are likely to be misled by unrepresentative sampling of gene histories, particularly in lineages that may have diversified rapidly. In addition, phylogenomic studies of diatoms, and potentially other hyperdiverse groups, should maximize the number of gene trees with high taxon occupancy, though there is clearly a limit to how many of these genes will be available.
硅藻(Bacillariophyta)是一类具有丰富物种的真核微生物,在形态、生态和代谢方面具有多样性。以前基于一个或几个基因对硅藻系统发育的重建导致了分辨率不一致或关键节点的支持率低。我们应用系统发育平行枝修剪技术对 94 个硅藻基因组和转录组数据集进行了分析,以推断长期以来难以确定的物种关系,使用连锁和汇总聚合法从跨越基因对齐中分类和列占有率广泛阈值的数据集中重建物种树。基因树和物种树之间的冲突随着分类群占有率的增加和应用于基因树的引导切割值的增加而减少。基因树和物种树之间的一致性对于短分支来说最低,并且随着边缘长度的对数增加而增加,这表明不完全谱系分选对短分支的物种树推断的影响不成比例,这是硅藻系统发育的一个常见特征。尽管许多数据处理方法的物种树拓扑结构基本一致,但连锁方法似乎比汇总聚合法更适合稀疏的排列。我们的结果强调,基于少数基因座的物种树推断方法很可能会受到基因历史代表性采样的误导,尤其是在可能快速多样化的谱系中。此外,硅藻的系统基因组学研究,以及可能的其他超多样性群体,应该最大限度地增加具有高分类群占有率的基因树的数量,尽管显然存在一个限制,即有多少这些基因是可用的。