Dong Jianrong, Fernández-Baca David, McMorris F R
Department of Computer Science, Iowa State University, Ames, IA 50011, USA.
Algorithms Mol Biol. 2010 Jan 4;5:2. doi: 10.1186/1748-7188-5-2.
Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former.
We study a variant of one of Cotton and Wilkinson's methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page.
The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies.
超树方法将来自多个部分重叠树的系统发育信息组合成一个更大的系统发育树,称为超树。迄今为止,已经提出了几种超树构建方法,但其中大多数在设计时并未考虑任何特定属性。最近,科顿和威尔金森将多数规则一致树方法扩展到超树设置,继承了前者的许多吸引人的属性。
我们研究了科顿和威尔金森的一种方法的变体,称为多数规则(+)超树。在证明构建多数规则(+)超树的一个关键潜在问题是NP难问题之后,我们开发了该问题的多项式规模精确整数线性规划公式。然后,我们提出了一种数据约简启发式方法,该方法可以识别可以独立解决的较小子问题。虽然这种技术不能保证产生最优解,但它可以显著减小问题规模。最后,我们报告了对我们的方法在各种真实数据集上的计算研究,包括肯尼迪和佩奇的121分类单元、7棵树的海鸟数据集。
结果表明,我们的精确方法对于中等规模的输入在计算上是可行的。对于更大的输入,我们的数据约简启发式方法使得处理远远超出基本整数规划方法范围的问题成为可能。我们的启发式方法得到的结果与精确解之间的比较表明,该启发式方法能产生较好的答案。我们的结果还表明,多数规则(+)方法,无论是其基本形式还是经过数据约简后,都能产生具有生物学意义的系统发育树。