Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA.
BMC Bioinformatics. 2013 Feb 26;14:66. doi: 10.1186/1471-2105-14-66.
A phylogeny postulates shared ancestry relationships among organisms in the form of a binary tree. Phylogenies attempt to answer an important question posed in biology: what are the ancestor-descendent relationships between organisms? At the core of every biological problem lies a phylogenetic component. The patterns that can be observed in nature are the product of complex interactions, constrained by the template that our ancestors provide. The problem of simultaneous tree and alignment estimation under Maximum Parsimony is known in combinatorial optimization as the Generalized Tree Alignment Problem (GTAP). The GTAP is the Steiner Tree Problem for the sequence edit distance. Like many biologically interesting problems, the GTAP is NP-Hard. Typically the Steiner Tree is presented under the Manhattan or the Hamming distances.
Experimentally, the accuracy of the GTAP has been subjected to evaluation. Results show that phylogenies selected using the GTAP from unaligned sequences are competitive with the best methods and algorithms available. Here, we implement and explore experimentally existing and new local search heuristics for the GTAP using simulated and real data.
The methods presented here improve by more than three orders of magnitude in execution time the best local search heuristics existing to date when applied to real data.
系统发育树假设有共同的祖先关系的生物以二进制树的形式。系统发育树试图回答生物学中的一个重要问题:生物体之间的祖先-后代关系是什么?每个生物学问题的核心都有一个系统发育成分。在自然界中可以观察到的模式是复杂相互作用的产物,受限于我们祖先提供的模板。最大简约法下同时进行树和比对估计的问题在组合优化中被称为广义树比对问题(GTAP)。GTAP 是序列编辑距离的 Steiner 树问题。像许多有趣的生物学问题一样,GTAP 是 NP 难的。通常,Steiner 树是在曼哈顿或汉明距离下呈现的。
实验中,GTAP 的准确性已经过评估。结果表明,使用 GTAP 从未对齐的序列中选择的系统发育与现有的最佳方法和算法具有竞争力。在这里,我们使用模拟和真实数据实现和实验性地探索了 GTAP 的现有和新的局部搜索启发式方法。
当应用于真实数据时,与迄今为止存在的最佳局部搜索启发式方法相比,本文提出的方法在执行时间上提高了三个数量级以上。