Bernt Matthias, Merkle Daniel, Middendorf Martin
Parallel Computing and Complex Systems Group, Department of Computer Science, University of Leipzig, Augustusplatz, Leipzig, Germany.
IEEE/ACM Trans Comput Biol Bioinform. 2006 Jul-Sep;3(3):275-88. doi: 10.1109/TCBB.2006.38.
The order of genes in the genomes of species can change during evolution and can provide information about their phylogenetic relationship. An interesting method to infer the phylogenetic relationship from the gene orders is to use different types of rearrangement operations and to find possible rearrangement scenarios using these operations. One of the most common rearrangement operations is reversals, which reverse the order of a subset of neighbored genes. In this paper, we study the problem to find the ancestral gene order for three species represented by their gene orders. The rearrangement scenario should use a minimal number of reversals and no other rearrangement operations. This problem is called the Median problem and is known to be NP-complete. In this paper, we describe a heuristic algorithm for finding solutions to the Median problem that searches for rearrangement scenarios with the additional property that gene groups should not be destroyed by reversal operations. The concept of conserved intervals for signed permutations is used to describe such gene groups. We show experimentally, for different types of test problems, that the proposed algorithm produces very good results compared to other algorithms for the Median problem. We also integrate our reversal selection procedure into the well-known MGR and GRAPPA algorithms and show that they achieve a significant speedup while obtaining solutions of the same quality as the original algorithms on the test problems.
在进化过程中,物种基因组中基因的顺序会发生变化,这能够提供有关它们系统发育关系的信息。一种从基因顺序推断系统发育关系的有趣方法是使用不同类型的重排操作,并利用这些操作找到可能的重排场景。最常见的重排操作之一是反转,它会反转相邻基因子集的顺序。在本文中,我们研究了一个问题,即根据三个物种的基因顺序来找到它们的祖先基因顺序。重排场景应使用最少数量的反转操作,且不使用其他重排操作。这个问题被称为中位数问题,已知是NP完全问题。在本文中,我们描述了一种启发式算法来解决中位数问题,该算法会搜索具有基因组不会因反转操作而被破坏这一附加属性的重排场景。带符号排列的保守区间概念被用于描述此类基因组。我们通过实验表明,对于不同类型的测试问题,与中位数问题的其他算法相比,所提出的算法产生了非常好的结果。我们还将我们的反转选择过程集成到了著名的MGR和GRAPPA算法中,并表明它们在获得与原始算法在测试问题上相同质量的解的同时,实现了显著的加速。