School of Computer Science and Technology, Shandong University, Jinan, Shandong 210500, China.
IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1220-9. doi: 10.1109/TCBB.2012.57.
Motivated by the trend of genome sequencing without completing the sequence of the whole genomes, a problem on filling an incomplete multichromosomal genome (or scaffold) I with respect to a complete target genome G was studied. The objective is to minimize the resulting genomic distance between I' and G, where I' is the corresponding filled scaffold. We call this problem the onesided scaffold filling problem. In this paper, we conduct a systematic study for the scaffold filling problem under the breakpoint distance and its variants, for both unichromosomal and multichromosomal genomes (with and without gene repetitions). When the input genome contains no gene repetition (i.e., is a fragment of a permutation), we show that the two-sided scaffold filling problem (i.e., G is also incomplete) is polynomially solvable for unichromosomal genomes under the breakpoint distance and for multichromosomal genomes under the genomic (or DCJ--Double-Cut-and-Join) distance. However, when the input genome contains some repeated genes, even the one-sided scaffold filling problem becomes NP-complete when the similarity measure is the maximum number of adjacencies between two sequences. For this problem, we also present efficient constant-factor approximation algorithms: factor-2 for the general case and factor 1.33 for the one-sided case.
受基因组测序趋势的启发,在不完成整个基因组序列的情况下,研究了一个关于用完整目标基因组 G 填充不完整的多染色体基因组(或支架)I 的问题。目标是使 I'和 G 之间的基因组距离最小化,其中 I'是相应的填充支架。我们将这个问题称为单边支架填充问题。在本文中,我们对支架填充问题进行了系统的研究,包括断点距离及其变体,适用于单染色体和多染色体基因组(有和没有基因重复)。当输入基因组不含基因重复(即,是排列的一个片段)时,我们证明了在断点距离下,双边支架填充问题(即 G 也是不完整的)对于单染色体基因组是多项式可解的,对于基因组(或 DCJ--Double-Cut-and-Join)距离下的多染色体基因组也是多项式可解的。然而,当输入基因组包含一些重复基因时,即使相似性度量是两个序列之间邻接的最大数量,单边支架填充问题也变得 NP 完全。对于这个问题,我们还提出了有效的常数因子近似算法:对于一般情况为 2 倍,对于单边情况为 1.33 倍。