Jiang Minghui
Department of Computer Science, Utah State University, Logan, Utah 84322, USA.
J Comput Biol. 2011 Sep;18(9):1077-86. doi: 10.1089/cmb.2011.0097.
Given two genomes with duplicate genes, Zero Exemplar Distance is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that Zero Exemplar Distance for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this article, we give a very simple alternative proof of this result. We also study the problem Zero Exemplar Distance for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of Zero Exemplar Distance admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem Exemplar Longest Common Subsequence in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that Zero Exemplar Distance for multichromosomal genomes without gene order is fixed-parameter tractable in the general case if the parameter is the maximum number of chromosomes in each genome.
给定两个具有重复基因的基因组,零范例距离问题是确定通过删除每个基因组中每个基因的除一个拷贝之外的所有拷贝,这两个基因组是否可以简化为没有重复基因的相同基因组。布林、费尔坦、西科拉和维亚莱特最近证明,即使每个基因在每个基因组中最多出现两次,单染色体基因组的零范例距离也是NP难的,从而解决了范例模型中基因组重排方面一个重要的开放性问题。在本文中,我们给出了这个结果的一个非常简单的替代证明。我们还研究了无基因顺序的多染色体基因组的零范例距离问题,并证明了类似的结果,即即使每个基因在每个基因组中最多出现两次,它也是NP难的。在积极的方向上,我们表明,如果每个基因在一个基因组中恰好出现一次,而在另一个基因组中至少出现一次,那么零范例距离的两个变体都允许多项式时间算法。此外,在每个强制符号在一个输入序列中恰好出现一次且在另一个输入序列中至少出现一次的特殊情况下,我们为相关问题范例最长公共子序列提出了一个多项式时间算法。这回答了博尼佐尼等人的一个开放性问题。我们还表明,如果参数是每个基因组中的最大染色体数,那么在一般情况下,无基因顺序的多染色体基因组的零范例距离是固定参数可处理的。