Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
Mol Phylogenet Evol. 2013 Dec;69(3):1186-9. doi: 10.1016/j.ympev.2013.08.001. Epub 2013 Aug 9.
Finding optimal evolutionary trees from sequence data is typically an intractable problem, and there is usually no way of knowing how close to optimal the best tree from some search truly is. The problem would seem to be particularly acute when we have many taxa and when that data has high levels of homoplasy, in which the individual characters require many changes to fit on the best tree. However, a recent mathematical result has provided a precise tool to generate a short number of high-homoplasy characters for any given tree, so that this tree is provably the optimal tree under the maximum parsimony criterion. This provides, for the first time, a rigorous way to test tree search algorithms on homoplasy-rich data, where we know in advance what the 'best' tree is. In this short note we consider just one search program (TNT) but show that it is able to locate the globally optimal tree correctly for 32,768 taxa, even though the characters in the dataset require, on average, 1148 state-changes each to fit on this tree, and the number of characters is only 57.
从序列数据中找到最优进化树通常是一个棘手的问题,而且通常无法知道从某些搜索中找到的最佳树有多接近最优。当我们有许多分类群并且数据具有高水平的同形性时,问题似乎尤为严重,在这种情况下,每个字符都需要多次变化才能适应最佳树。然而,最近的一个数学结果为生成给定树的少数几个高同形性字符提供了一个精确的工具,从而证明了在最大简约标准下该树是最优的。这首次为在同形性丰富的数据上测试树搜索算法提供了一种严格的方法,我们事先知道“最佳”树是什么。在这个简短的注释中,我们仅考虑一个搜索程序(TNT),但表明它能够正确找到全局最优树,即使数据集中的字符平均每个字符需要 1148 次状态变化才能适应这棵树,并且字符数仅为 57。