Janies Daniel A, Studer Jonathon, Handelman Samuel K, Linchangco Gregorio
Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
Case Western Reserve University School of Law, 11075 East Boulevard, Cleveland, OH, 44106, USA.
Cladistics. 2013 Oct;29(5):560-566. doi: 10.1111/cla.12014. Epub 2013 Feb 18.
It has been proposed that supertree approaches should be applied to large multilocus datasets to achieve computational tractability. Large datasets such as those derived from phylogenomics studies can be broken into many locus-specific tree searches and the resulting trees can be stitched together via a supertree method. Using simulated data, workers have reported that they can rapidly construct a supertree that is comparable to the results of heuristic tree search on the entire dataset. To test this assertion with organismal data, we compare tree length under the parsimony criterion and computational time for 20 multilocus datasets using supertree (SuperFine and SuperTriplets) and supermatrix (heuristic search in TNT) approaches. Tree length and computational times were compared among methods using the Wilcoxon matched-pairs signed rank test. Supermatrix searches produced significantly shorter trees than either supertree approach (SuperFine or SuperTriplets; P < 0.0002 in both cases). Moreover, the processing time of supermatrix search was significantly lower than SuperFine+locus-specific search (P < 0.01) but roughly equivalent to that of SuperTriplets+locus-specific search (P > 0.4, not significant). In conclusion, we show by using real rather than simulated data that there is no basis, either in time tractability or in tree length, for use of supertrees over heuristic tree search using a supermatrix for phylogenomics.
有人提出,应将超树方法应用于大型多位点数据集,以实现计算的可处理性。诸如从系统发育基因组学研究中获得的大型数据集,可以分解为许多特定基因座的树搜索,然后通过超树方法将得到的树拼接在一起。使用模拟数据,研究人员报告说,他们可以快速构建一棵超树,其结果与对整个数据集进行启发式树搜索的结果相当。为了用生物数据检验这一断言,我们使用超树(SuperFine和SuperTriplets)和超矩阵(TNT中的启发式搜索)方法,比较了20个多位点数据集在简约标准下的树长和计算时间。使用Wilcoxon配对符号秩检验比较了不同方法之间的树长和计算时间。超矩阵搜索产生的树明显比任何一种超树方法(SuperFine或SuperTriplets)都短(两种情况下P均<0.0002)。此外,超矩阵搜索的处理时间明显低于SuperFine+特定基因座搜索(P<0.01),但与SuperTriplets+特定基因座搜索大致相当(P>0.4,不显著)。总之,我们通过使用真实而非模拟数据表明,在系统发育基因组学中,无论是在时间可处理性还是树长方面,使用超树而非使用超矩阵进行启发式树搜索都没有依据。