Department of Ornithology, American Museum of Natural History, New York, New York.
New York Genome Center, New York, New York.
Genome Biol Evol. 2020 Jul 1;12(7):1131-1147. doi: 10.1093/gbe/evaa113.
The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.
随着 DNA 测序技术的进步,生命之树的分辨率得到了提高。为了实现密集的分类群采样,通常需要从历史博物馆标本中获取 DNA 来补充现代遗传样本。然而,历史材料中的 DNA 通常会降解,这带来了各种挑战。在这项研究中,我们评估了变体位点的覆盖度和历史与现代样本之间的缺失数据如何影响系统基因组推断。我们通过对 105 个分类群的超保守元件进行采样,探讨了澳大拉西亚刷舌鹦鹉(lories 和 lorikeets)中的这些模式。在估计覆盖率较低的特征时,树的几个分支的关系似乎受到样本是来自历史标本还是现代标本的影响,而在应用更严格的过滤时则没有观察到这种情况。为了评估拓扑结构是否受到缺失数据的影响,我们对位点和基因座进行了异常值分析,并采用基于数据完整性排除位点的数据简化方法。根据异常值测试,总站点的 0.15%或基因座的 38%驱动了树之间的拓扑差异,在这些位点上,历史样本的缺失数据比现代样本多 10.9 倍。相比之下,需要 70%的数据完整性才能避免虚假关系。预测模型发现,异常值分析得分与在过滤时拓扑变化最大的分支中的简约信息位点相关。在考虑了偏置基因座并了解了关系的稳定性后,我们为 lories 和 lorikeets 推断出了一个更稳健的系统发育假说。