Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA.
Department of Ornithology, Academy of Natural Sciences of Drexel University, 1900 Benjamin Franklin Parkway, Philadelphia, PA 19103, USA.
Syst Biol. 2023 May 19;72(1):228-241. doi: 10.1093/sysbio/syac055.
Gene tree discordance is expected in phylogenomic trees and biological processes are often invoked to explain it. However, heterogeneous levels of phylogenetic signal among individuals within data sets may cause artifactual sources of topological discordance. We examined how the information content in tips and subclades impacts topological discordance in the parrots (Order: Psittaciformes), a diverse and highly threatened clade of nearly 400 species. Using ultraconserved elements from 96% of the clade's species-level diversity, we estimated concatenated and species trees for 382 ingroup taxa. We found that discordance among tree topologies was most common at nodes dating between the late Miocene and Pliocene, and often at the taxonomic level of the genus. Accordingly, we used two metrics to characterize information content in tips and assess the degree to which conflict between trees was being driven by lower-quality samples. Most instances of topological conflict and nonmonophyletic genera in the species tree could be objectively identified using these metrics. For subclades still discordant after tip-based filtering, we used a machine learning approach to determine whether phylogenetic signal or noise was the more important predictor of metrics supporting the alternative topologies. We found that when signal favored one of the topologies, the noise was the most important variable in poorly performing models that favored the alternative topology. In sum, we show that artifactual sources of gene tree discordance, which are likely a common phenomenon in many data sets, can be distinguished from biological sources by quantifying the information content in each tip and modeling which factors support each topology. [Historical DNA; machine learning; museomics; Psittaciformes; species tree.].
基因树分歧在系统发育树中是预期的,并且经常援引生物过程来解释它。然而,数据集内个体之间的系统发育信号异质性可能导致拓扑分歧的人为来源。我们研究了信息内容在叶和亚分支中的作用如何影响鹦鹉(目:鹦鹉形目)的拓扑分歧,这是一个多样化且受到高度威胁的近 400 种物种的分支。我们使用来自 96%的该分支种级多样性的超保守元件,为 382 个内群分类单元估计了串联和种系树。我们发现,树拓扑结构之间的分歧最常见于中新世晚期到上新世之间的节点,并且通常在属的分类水平上。因此,我们使用两个指标来描述叶中的信息内容,并评估树之间的冲突在多大程度上是由质量较低的样本驱动的。使用这些指标,可以客观地识别种系树中拓扑冲突和非单系属的大多数实例。对于基于叶过滤后仍然分歧的亚分支,我们使用机器学习方法来确定系统发育信号或噪声哪个是支持替代拓扑结构的指标的更重要预测因子。我们发现,当信号支持一种拓扑结构时,在支持替代拓扑结构的表现不佳的模型中,噪声是最重要的变量。总之,我们表明,基因树分歧的人为来源,这可能是许多数据集中的一个常见现象,可以通过量化每个叶的信息内容并对支持每个拓扑结构的因素进行建模来与生物来源区分开来。[历史 DNA;机器学习;museomics;鹦鹉形目;种系树。]。