Suppr超能文献

分析系统发育数据集时,阶地式树景的出现频率。

The prevalence of terraced treescapes in analyses of phylogenetic data sets.

机构信息

Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell St, Tucson, AZ, 85721, USA.

出版信息

BMC Evol Biol. 2018 Apr 4;18(1):46. doi: 10.1186/s12862-018-1162-9.

Abstract

BACKGROUND

The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods.

RESULTS

Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support.

CONCLUSIONS

If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

摘要

背景

系统发育数据集的数据可用性模式可能导致阶地的形成,阶地是一系列具有同等最佳树的集合。如果使用简约法或分区、边缘不链接最大似然法对树进行评分,那么树空间中可能会出现阶地。理论预测阶地可能很大,但它们在当代数据集的普遍性从未被调查过。我们选择了最近文献中报道的 26 个数据集和系统发育树,并在一组共同的推断假设下,研究了这些树所属的阶地。我们考察了阶地大小作为数据集采样特性的函数,包括分类群覆盖率密度(存在任何数据的分类群-基因位置的比例)和基因采样“充分性”的度量。我们根据理论上减少阶地大小到一棵树所需的最小基因采样深度来评估每个数据集,并探讨了重复树中的阶地在自举方法中的影响。

结果

在分类群覆盖率密度 < 0.90 的几乎所有数据集中都发现了阶地。然而,在高覆盖率密度(即 ≥ 0.94)的转录组和基因组数据集中没有发现阶地。阶地可能非常大,大小与分类群覆盖率密度和基因采样充分性成反比。很少有数据集达到减少阶地大小到一棵树所需的理论最小基因采样深度。自举重采样中发现的阶地降低了整体支持度。

结论

如果某些推断假设适用,那么从经验数据集估计的树通常属于具有同等最佳树的大阶地。阶地大小与数据集采样特性相关。数据集很少包含足够的基因来减少阶地大小到一棵树。当自举复制树位于阶地上时,对系统发育假说的统计支持可能会降低。虽然一些已发表的分析是在边缘链接推断模型(不会产生阶地)下进行的,但也使用和提倡了不链接模型。本研究描述了在当前广泛用于大规模树构建的多基因数据集背景下,这种推断假设对系统发育推断的潜在影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/052c/5885316/c2fba5866fe4/12862_2018_1162_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验