Nixon Kevin C
L. H. Bailey Hortorium, Department of Plant Biology, Cornell University, Ithaca, New York, 14853.
Cladistics. 1999 Dec;15(4):407-414. doi: 10.1111/j.1096-0031.1999.tb00277.x.
The Parsimony Ratchet is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a "Wagner" tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., "branch-breaking" or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the "original" weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2-6 are considered to be one iteration, and typically, 50-200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest trees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20×-80× over "traditional methods" such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the "true" consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days.
简约棘轮法是一种用于分析大型数据集的新方法。通过生成批处理命令文件,该方法可以很容易地在现有的系统发育软件中实现。这种方法已在程序DADA(尼克松,1998年)和Winclada(尼克松,1999年)中得以应用。简约棘轮法也已在最新版本的NONA(戈洛博夫,1998年)中实现。棘轮法的这些实现方式采用以下步骤:(1)生成一棵起始树(例如,一棵“瓦格纳”树,随后进行一定程度的分支交换或不进行交换)。(2)随机选择一组字符,每个字符赋予额外的权重(例如,给每个选定字符的权重加1)。(3)使用重新加权的矩阵对当前树进行分支交换(例如,“破分支”或树二分再连接),仅保留一棵(或少数几棵)树。(4)将所有字符的权重设置为“原始”权重(通常为相等的权重)。(5)对当前树(步骤3中的树)进行分支交换(例如,破分支或树二分再连接),保留一棵(或少数几棵)树。(6)返回步骤2。步骤2至6被视为一次迭代,通常要进行50至200次或更多次迭代。步骤2中用于重新加权而要抽样的字符数量由用户确定;我发现,在大多数情况下,5%至25%的字符能产生良好的结果。棘轮法对大型数据集的性能非常出色,此处展示了对包含500个分类单元种子植物rbcL数据集(蔡斯等人,1993年)的分析结果。对包含567个分类单元的三基因数据集的单独分析将在其他地方呈现(索尔蒂斯等人,正在准备中),展示同样非凡的能力。对于包含500个分类单元的数据集,在一台200兆赫兹奔腾Pro电脑上,通常在22小时内(进行4次每次200次迭代的运行)就能找到最短树。这些分析表明,与“传统方法”相比效率提高了20倍至80倍,传统方法如随机改变分类单元顺序并保留少数几棵树,然后对找到的最佳树进行更全面的分析,而且比使用PAUP进行的无策略搜索快数千倍。因为棘轮法从每个树岛中抽样较少的树来遍历多个树岛,所以与从少数树岛收集许多树相比,它能提供对“真实”共识更准确的估计。借助棘轮法、戈洛博夫的NONA以及现有的计算机硬件,以前难以处理或使用PAUP*需要数月或数年分析的数据集现在可以在几小时或几天内得到充分分析。