Nguyen C Thach, Tay Y C, Zhang Louxin
School of Computing, National University of Singapore, 117543, Singapore.
Bioinformatics. 2005 May 15;21(10):2171-6. doi: 10.1093/bioinformatics/bti327. Epub 2005 Feb 15.
A one-to-one correspondence between the sets of genes in the two genomes being compared is necessary for the notions of breakpoint and reversal distances. To compare genomes where there are paralogous genes, Sankoff formulated the exemplar distance problem as a general version of the genome rearrangement problem. Unfortunately, the problem is NP-hard even for the breakpoint distance.
This paper proposes a divide-and-conquer approach for calculating the exemplar breakpoint distance between two genomes with multiple gene families. The combination of our approach and Sankoff's branch-and-bound technique leads to a practical program to answer this question. Tests with both simulated and real datasets show that our program is much more efficient than the existing program that is based only on the branch-and-bound technique.
Code for the program is available from the authors.
对于断点距离和反转距离的概念而言,被比较的两个基因组中的基因集之间存在一一对应关系是必要的。为了比较存在旁系同源基因的基因组,桑科夫将典范距离问题表述为基因组重排问题的一个通用版本。不幸的是,即使对于断点距离,该问题也是NP难的。
本文提出了一种分治方法,用于计算具有多个基因家族的两个基因组之间的典范断点距离。我们的方法与桑科夫的分支定界技术相结合,产生了一个实用的程序来回答这个问题。对模拟数据集和真实数据集的测试表明,我们的程序比仅基于分支定界技术的现有程序效率高得多。
该程序的代码可从作者处获取。