Suppr超能文献

迈向构建生命之树:对所有被子植物属的模拟研究

Towards building the tree of life: a simulation study for all angiosperm genera.

作者信息

Salamin Nicolas, Hodkinson Trevor R, Savolainen Coates Vincent

机构信息

Department of Botany, University of Dublin, Trinity College, Dublin 2, Ireland.

出版信息

Syst Biol. 2005 Apr;54(2):183-96. doi: 10.1080/10635150590923254.

Abstract

Comprehensive phylogenetic trees are essential tools to better understand evolutionary processes. For many groups of organisms or projects aiming to build the Tree of Life, comprehensive phylogenetic analysis implies sampling hundreds to thousands of taxa. For the tree of all life this task rises to a highly conservative 13 million. Here, we assessed the performances of methods to reconstruct large trees using Monte Carlo simulations with parameters inferred from four large angiosperm DNA matrices, containing between 141 and 567 taxa. For each data set, parameters of the HKY85+G model were estimated and used to simulate 20 new matrices for sequence lengths from 100 to 10,000 base pairs. Maximum parsimony and neighbor joining were used to analyze each simulated matrix. In our simulations, accuracy was measured by counting the number of nodes in the model tree that were correctly inferred. The accuracy of the two methods increased very quickly with the addition of characters before reaching a plateau around 1000 nucleotides for any sizes of trees simulated. An increase in the number of taxa from 141 to 567 did not significantly decrease the accuracy of the methods used, despite the increase in the complexity of tree space. Moreover, the distribution of branch lengths rather than the rate of evolution was found to be the most important factor for accurately inferring these large trees. Finally, a tree containing 13,000 taxa was created to represent a hypothetical tree of all angiosperm genera and the efficiency of phylogenetic reconstructions was tested with simulated matrices containing an increasing number of nucleotides up to a maximum of 30,000. Even with such a large tree, our simulations suggested that simple heuristic searches were able to infer up to 80% of the nodes correctly.

摘要

全面的系统发育树是更好地理解进化过程的重要工具。对于许多生物类群或旨在构建生命之树的项目而言,全面的系统发育分析意味着要对数百到数千个分类单元进行采样。对于所有生命的树来说,这项任务增加到了极为保守的1300万个。在这里,我们使用从四个大型被子植物DNA矩阵推断出的参数进行蒙特卡罗模拟,评估了重建大型树的方法的性能,这些矩阵包含141至567个分类单元。对于每个数据集,估计了HKY85 + G模型的参数,并用于模拟20个新的矩阵,序列长度从100到10000个碱基对。使用最大简约法和邻接法分析每个模拟矩阵。在我们的模拟中,通过计算模型树中正确推断出的节点数量来衡量准确性。对于任何模拟的树大小,在达到约1000个核苷酸的平稳期之前,随着字符的增加,这两种方法的准确性都迅速提高。分类单元数量从141增加到567并没有显著降低所用方法的准确性,尽管树空间的复杂性有所增加。此外,发现分支长度的分布而非进化速率是准确推断这些大树的最重要因素。最后,创建了一个包含13000个分类单元的树来代表所有被子植物属的假设树,并使用包含最多30000个核苷酸的模拟矩阵测试了系统发育重建的效率。即使对于如此大的树,我们的模拟表明简单的启发式搜索能够正确推断高达80%的节点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验