Bernt Matthias, Merkle Daniel, Middendorf Martin
Department of Computer Science, Parallel Computing and Complex Systems Group, University of Leipzig, Germany.
IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):332-47. doi: 10.1109/TCBB.2008.39.
Genomic rearrangement operations can be very useful to infer the phylogenetic relationship of gene orders representing species. We study the problem of finding potential ancestral gene orders for the gene orders of given taxa, such that the corresponding rearrangement scenario has a minimal number of reversals, and where each of the reversals has to preserve the common intervals of the given input gene orders. Common intervals identify sets of genes that occur consecutively in all input gene orders. The problem of finding such an ancestral gene order is called the preserving reversal median problem (pRMP). A tree-based data structure for the representation of the common intervals of all input gene orders is used in our exact algorithm TCIP for solving the pRMP. It is known that the minimum number of reversals to transform one gene order into another can be computed in polynomial time, whereas the corresponding problem with the restriction that common intervals should not be destroyed is already NP-hard. It is shown theoretically that TCIP can solve a large class of pRMP instances in polynomial time. Empirically we show the good performance of TCIP on biological and artificial data.
基因组重排操作对于推断代表物种的基因顺序的系统发育关系可能非常有用。我们研究为给定分类单元的基因顺序寻找潜在祖先基因顺序的问题,使得相应的重排场景具有最少的反转次数,并且每个反转都必须保留给定输入基因顺序的公共区间。公共区间识别在所有输入基因顺序中连续出现的基因集。寻找这样一个祖先基因顺序的问题称为保留反转中位数问题(pRMP)。我们用于解决pRMP的精确算法TCIP中使用了一种基于树的数据结构来表示所有输入基因顺序的公共区间。已知在多项式时间内可以计算将一个基因顺序转换为另一个基因顺序所需的最少反转次数,而对于不能破坏公共区间这一限制的相应问题已经是NP难的。从理论上证明了TCIP可以在多项式时间内解决一大类pRMP实例。通过实验我们展示了TCIP在生物数据和人工数据上的良好性能。