Arnold Christian, Stadler Peter F
Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.
Algorithms Mol Biol. 2010 Jun 2;5:25. doi: 10.1186/1748-7188-5-25.
The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights omegaxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however.
We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP.
The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations.
最大配对问题(MPP)是一类组合优化问题的原型,在生物信息学中备受关注:给定一棵任意的系统发育树T以及任意两对叶子节点(x,y)之间路径的权重ωxy,能使总权重最大的叶子节点对之间的边不相交路径集合是什么?之前已经描述了二叉树和等权重情况下MPP的特殊情况;然而,解决一般MPP的算法仍然缺失。
我们针对二叉树的特殊情况描述了一种相对简单的动态规划算法。然后我们表明,多叉树的一般情况可以通过将某些辅助最大加权匹配问题的解决方案与这种动态规划方法的扩展交织起来处理,从而得到一个关于叶子节点数量n的复杂度为(n4 log n)的整体多项式时间解决方案。可以在GNU公共许可证下从http://www.bioinf.uni-leipzig.de/Software/Targeting获取C实现的源代码。对于二叉树,我们还讨论了MPP的几种受限变体以及MPP概率版本的配分函数方法。
这里介绍的算法使得也能够解决具有高度数顶点的大树的MPP。这在比较系统发育学领域具有实际意义,例如在系统发育靶向的背景下,即在资源有限的数据收集方面。