Gupta Arvind, Karimi Mohammad M, Manuch Ján, Stacho Ladislav, Zhao Xiaohong
Department of Computer Sciences, University of British Columbia, Vancouver, BC, Canada.
J Comput Biol. 2010 Oct;17(10):1435-49. doi: 10.1089/cmb.2009.0117.
The problem of determining haplotypes from genotypes has gained considerable prominence in the research community since the beginning of the HapMap project. Here the focus is on determining the sets of SNP values of individual chromosomes (haplotypes), since such information better captures the genetic causes of diseases. One of the main algorithmic tools for haplotyping is based on the assumption that the evolutionary history for the original haplotypes satisfies perfect phylogeny. This tool can be applied only on individual blocks of chromosomes, in which it is assumed that recombinations do not happen. However, exact determination of blocks is usually not possible. It would be desirable to develop a method for haplotyping which can account for recombinations, and thus can be applied on multiblock sections of chromosomes. A natural candidate for such a method is haplotyping via phylogenetic networks (which model recombinations) or their simplified version: galled-tree networks. However, even haplotyping via galled-tree networks appears hard, as the efficient algorithms exist only for very special cases: the galled-tree network has either a single gall or only small galls with two mutations each. Building on our previous results, we show that, in general, haplotyping via galled-tree networks is NP-complete, and it remains NP-complete when galls are allowed to have at most k mutations, for any k ≥ 3.
自国际人类基因组单体型图计划启动以来,从基因型确定单倍型的问题在研究界已变得相当突出。这里的重点是确定单个染色体的单核苷酸多态性(SNP)值集合(单倍型),因为此类信息能更好地捕捉疾病的遗传病因。用于单倍型分型的主要算法工具之一基于这样的假设:原始单倍型的进化历史满足完美系统发育。该工具仅能应用于染色体的单个区域,在这些区域假设不会发生重组。然而,通常无法精确确定区域。开发一种能考虑重组情况、从而可应用于染色体多区域的单倍型分型方法将是很有必要的。这种方法的一个自然候选方案是通过系统发育网络(对重组进行建模)或其简化版本:带结树网络进行单倍型分型。然而,即使是通过带结树网络进行单倍型分型似乎也很困难,因为仅在非常特殊的情况下才存在高效算法:带结树网络要么只有一个结,要么只有每个带有两个突变的小结。基于我们之前的研究成果,我们表明,一般而言,通过带结树网络进行单倍型分型是NP完全问题,并且当允许结最多有k个突变时,对于任何k≥3,它仍然是NP完全问题。