分析系统发育数据集时，阶地式树景的出现频率。

The prevalence of terraced treescapes in analyses of phylogenetic data sets.

机构信息

Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell St, Tucson, AZ, 85721, USA.

出版信息

BMC Evol Biol. 2018 Apr 4;18(1):46. doi: 10.1186/s12862-018-1162-9.

DOI:10.1186/s12862-018-1162-9

PMID:29618314

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5885316/

Abstract

BACKGROUND

The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods.

RESULTS

Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support.

CONCLUSIONS

If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

摘要

背景

系统发育数据集的数据可用性模式可能导致阶地的形成，阶地是一系列具有同等最佳树的集合。如果使用简约法或分区、边缘不链接最大似然法对树进行评分，那么树空间中可能会出现阶地。理论预测阶地可能很大，但它们在当代数据集的普遍性从未被调查过。我们选择了最近文献中报道的 26 个数据集和系统发育树，并在一组共同的推断假设下，研究了这些树所属的阶地。我们考察了阶地大小作为数据集采样特性的函数，包括分类群覆盖率密度（存在任何数据的分类群-基因位置的比例）和基因采样“充分性”的度量。我们根据理论上减少阶地大小到一棵树所需的最小基因采样深度来评估每个数据集，并探讨了重复树中的阶地在自举方法中的影响。

结果

在分类群覆盖率密度 < 0.90 的几乎所有数据集中都发现了阶地。然而，在高覆盖率密度（即 ≥ 0.94）的转录组和基因组数据集中没有发现阶地。阶地可能非常大，大小与分类群覆盖率密度和基因采样充分性成反比。很少有数据集达到减少阶地大小到一棵树所需的理论最小基因采样深度。自举重采样中发现的阶地降低了整体支持度。

结论

如果某些推断假设适用，那么从经验数据集估计的树通常属于具有同等最佳树的大阶地。阶地大小与数据集采样特性相关。数据集很少包含足够的基因来减少阶地大小到一棵树。当自举复制树位于阶地上时，对系统发育假说的统计支持可能会降低。虽然一些已发表的分析是在边缘链接推断模型（不会产生阶地）下进行的，但也使用和提倡了不链接模型。本研究描述了在当前广泛用于大规模树构建的多基因数据集背景下，这种推断假设对系统发育推断的潜在影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/052c/5885316/c2fba5866fe4/12862_2018_1162_Fig1_HTML.jpg

相似文献

The prevalence of terraced treescapes in analyses of phylogenetic data sets.分析系统发育数据集时，阶地式树景的出现频率。

BMC Evol Biol. 2018 Apr 4;18(1):46. doi: 10.1186/s12862-018-1162-9.

Impacts of Terraces on Phylogenetic Inference.梯田对系统发育推断的影响。

Syst Biol. 2015 Sep;64(5):709-26. doi: 10.1093/sysbio/syv024. Epub 2015 May 20.

Terraces in species tree inference from gene trees.从基因树上推断物种树的阶。

BMC Ecol Evol. 2024 Nov 4;24(1):135. doi: 10.1186/s12862-024-02309-z.

Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference.系统发育基因组学推断中常见拓扑重排对划分树的影响。

J Comput Biol. 2015 Dec;22(12):1129-42. doi: 10.1089/cmb.2015.0146. Epub 2015 Oct 8.

Gentrius: Generating Trees Compatible With a Set of Unrooted Subtrees and its Application to Phylogenetic Terraces.金特里乌斯：生成与一组无根子树兼容的树及其在系统发育阶地中的应用。

Mol Biol Evol. 2024 Nov 1;41(11). doi: 10.1093/molbev/msae219.

Phylogenomics with incomplete taxon coverage: the limits to inference.不完全分类群覆盖的系统基因组学：推断的局限性。

BMC Evol Biol. 2010 May 25;10:155. doi: 10.1186/1471-2148-10-155.

Two C++ libraries for counting trees on a phylogenetic terrace.两个用于在系统发生阶地上计算树的 C++ 库。

Bioinformatics. 2018 Oct 1;34(19):3399-3401. doi: 10.1093/bioinformatics/bty384.

Terrace Aware Data Structure for Phylogenomic Inference from Supermatrices.用于从超级矩阵进行系统发育基因组推断的分层感知数据结构

Syst Biol. 2016 Nov;65(6):997-1008. doi: 10.1093/sysbio/syw037. Epub 2016 Apr 26.

Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation.基于相对分支长度差异和模型违背情况下蛋白质序列数据的贝叶斯和最大似然系统发育分析。

BMC Evol Biol. 2005 Jan 28;5:8. doi: 10.1186/1471-2148-5-8.

Terraces in phylogenetic tree space.系统发育树空间中的阶。

Science. 2011 Jul 22;333(6041):448-50. doi: 10.1126/science.1206357. Epub 2011 Jun 16.

引用本文的文献

Terraces in species tree inference from gene trees.从基因树上推断物种树的阶。

BMC Ecol Evol. 2024 Nov 4;24(1):135. doi: 10.1186/s12862-024-02309-z.

Mol Biol Evol. 2024 Nov 1;41(11). doi: 10.1093/molbev/msae219.

Phylogeny Estimation Given Sequence Length Heterogeneity.给定序列长度异质性的系统发育估计。

Syst Biol. 2021 Feb 10;70(2):268-282. doi: 10.1093/sysbio/syaa058.

One thousand plant transcriptomes and the phylogenomics of green plants.一万种植物转录组与绿色植物的系统发生基因组学

Nature. 2019 Oct;574(7780):679-685. doi: 10.1038/s41586-019-1693-2. Epub 2019 Oct 23.

mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria.mtProtEvol：一个提供脊椎动物线粒体功能相关蛋白的分子进化分析的资源。

BMC Evol Biol. 2019 Feb 26;19(Suppl 1):47. doi: 10.1186/s12862-019-1371-x.

本文引用的文献

Unringing a bell: metazoan phylogenomics and the partition bootstrap.敲响警钟：后生动物系统发育基因组学与分区自展法

Cladistics. 2010 Aug;26(4):444-452. doi: 10.1111/j.1096-0031.2009.00295.x. Epub 2009 Nov 24.

Misleading results of likelihood-based phylogenetic analyses in the presence of missing data.存在缺失数据时基于似然法的系统发育分析的误导性结果。

Cladistics. 2012 Apr;28(2):208-222. doi: 10.1111/j.1096-0031.2011.00375.x. Epub 2011 Oct 3.

The Impact of Missing Data on Species Tree Estimation.缺失数据对物种树估计的影响。

Mol Biol Evol. 2016 Mar;33(3):838-60. doi: 10.1093/molbev/msv266. Epub 2015 Nov 20.

Consequences of Common Topological Rearrangements for Partition Trees in Phylogenomic Inference.系统发育基因组学推断中常见拓扑重排对划分树的影响。

J Comput Biol. 2015 Dec;22(12):1129-42. doi: 10.1089/cmb.2015.0146. Epub 2015 Oct 8.

Probabilistic models of eukaryotic evolution: time for integration.真核生物进化的概率模型：整合的时机

Philos Trans R Soc Lond B Biol Sci. 2015 Sep 26;370(1678):20140338. doi: 10.1098/rstb.2014.0338.

Impacts of Terraces on Phylogenetic Inference.梯田对系统发育推断的影响。

Syst Biol. 2015 Sep;64(5):709-26. doi: 10.1093/sysbio/syv024. Epub 2015 May 20.

Speciation dynamics during the global radiation of extant bats.现存蝙蝠的全球辐射过程中的物种形成动态。

Evolution. 2015 Jun;69(6):1528-1545. doi: 10.1111/evo.12681. Epub 2015 Jun 9.

Dissecting Molecular Evolution in the Highly Diverse Plant Clade Caryophyllales Using Transcriptome Sequencing.利用转录组测序剖析高度多样化的石竹目植物分支中的分子进化

Mol Biol Evol. 2015 Aug;32(8):2001-14. doi: 10.1093/molbev/msv081. Epub 2015 Apr 2.

Building the avian tree of life using a large-scale, sparse supermatrix.利用大规模稀疏超级矩阵构建鸟类生命树。

Mol Phylogenet Evol. 2015 Mar;84:53-63. doi: 10.1016/j.ympev.2014.12.003. Epub 2014 Dec 27.

Phylogenomics resolves the timing and pattern of insect evolution.系统基因组学解决了昆虫进化的时间和模式问题。

Science. 2014 Nov 7;346(6210):763-7. doi: 10.1126/science.1257570. Epub 2014 Nov 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

分析系统发育数据集时，阶地式树景的出现频率。

The prevalence of terraced treescapes in analyses of phylogenetic data sets.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献