Suppr超能文献

捉迷藏:为数千个同源序列丰富的序列放置和找到最佳树。

Hide and seek: placing and finding an optimal tree for thousands of homoplasy-rich sequences.

机构信息

Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.

出版信息

Mol Phylogenet Evol. 2013 Dec;69(3):1186-9. doi: 10.1016/j.ympev.2013.08.001. Epub 2013 Aug 9.

Abstract

Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the 'best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree correctly for 32,768 taxa, even though the characters in the dataset require, on average, 1148 state-changes each to fit on this tree, and the number of characters is only 57.

摘要

从序列数据中找到最优进化树通常是一个棘手的问题,而且通常无法知道从某些搜索中找到的最佳树有多接近最优。当我们有许多分类群并且数据具有高水平的同形性时,问题似乎尤为严重,在这种情况下,每个字符都需要多次变化才能适应最佳树。然而,最近的一个数学结果为生成给定树的少数几个高同形性字符提供了一个精确的工具,从而证明了在最大简约标准下该树是最优的。这首次为在同形性丰富的数据上测试树搜索算法提供了一种严格的方法,我们事先知道“最佳”树是什么。在这个简短的注释中,我们仅考虑一个搜索程序(TNT),但表明它能够正确找到全局最优树,即使数据集中的字符平均每个字符需要 1148 次状态变化才能适应这棵树,并且字符数仅为 57。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验