Real Jardin Botanico, Department of Biodiversity and Conservation, CSIC, Plaza de Murillo 2, 28014 Madrid, Spain.
Syst Biol. 2011 Jan;60(1):32-44. doi: 10.1093/sysbio/syq057. Epub 2010 Nov 10.
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
在过去的 20 年中,超级树重建一直是一个活跃的研究领域,并且已经开发出了许多主要的算法。由于超级树方法越来越受欢迎,因此有必要评估这些算法的性能,以确定哪些是最佳选择(尤其是在广泛使用的超级矩阵方法方面)。在这项研究中,使用来自全球开花植物科 Sapindaceae 的大量经验数据集(就分类单元和分子标记的数量而言)研究了七种最常用的超级树方法。超级树方法使用几个标准进行评估:超级树与输入树的相似性,超级树与总证据树的相似性,超级树的分辨率水平以及算法所需的计算时间。还对简化数据集进行了其他分析,以测试性能水平是否受到启发式搜索的影响,而不是算法本身。根据我们的结果,确定了两类主要的超级树方法:一方面,基于矩阵表示的简约法(MRP)、MinFlip 和 MinCut 方法根据我们的标准表现良好,而平均共识、分裂拟合和最相似超级树方法的性能较差,或者至少与总证据树的行为方式不同。在这里测试的最新方法超级距离矩阵的结果很有希望,至少有一种衍生方法的表现与 MRP、MinFlip 和 MinCut 一样好。当应用于简化数据集时,每种方法的输出仅略有改善,这表明启发式搜索的行为正确,并且算法对数据集大小和缺失数据的敏感性相对较低。结果还表明,即使使用简单的启发式搜索策略,MRP 分析也可以达到很高的质量水平,除了使用 Purvis 编码方案和可逆简约法的 MRP 外。超级树的未来在于为所有方法实现标准化的启发式搜索,并增加计算能力以处理大型数据集。对于有前途的方法(例如最大四分体拟合方法)而言,这将特别有用,因为该方法仍需要大量的计算能力。