Brown Daniel G, Harrower Ian M
David R. Cheriton School of Computer Science, University of Waterloo, 200 University Ave. West, Waterloo, ON N2L 3G1 Canada.
IEEE/ACM Trans Comput Biol Bioinform. 2006 Apr-Jun;3(2):141-54. doi: 10.1109/TCBB.2006.24.
In 2003, Gusfield introduced the Haplotype Inference by Pure Parsimony (HIPP) problem and presented an integer program (IP) that quickly solved many simulated instances of the problem. Although it solved well on small instances, Gusfield's IP can be of exponential size in the worst case. Several authors have presented polynomial-sized IPs for the problem. In this paper, we further the work on IP approaches to HIPP. We extend the existing polynomial-sized IPs by introducing several classes of valid cuts for the IP. We also present a new polynomial-sized IP formulation that is a hybrid between two existing IP formulations and inherits many of the strengths of both. Many problems that are too complex for the exponential-sized formulations can still be solved in our new formulation in a reasonable amount of time. We provide a detailed empirical comparison of these IP formulations on both simulated and real genotype sequences. Our formulation can also be extended in a variety of ways to allow errors in the input or model the structure of the population under consideration.
2003年,古斯菲尔德提出了纯简约单倍型推断(HIPP)问题,并给出了一个整数规划(IP),该规划能快速解决该问题的许多模拟实例。尽管它在小实例上解决得很好,但古斯菲尔德的整数规划在最坏情况下可能具有指数规模。几位作者针对该问题提出了多项式规模的整数规划。在本文中,我们进一步开展了针对HIPP的整数规划方法的研究。我们通过为整数规划引入几类有效割平面来扩展现有的多项式规模的整数规划。我们还提出了一种新的多项式规模的整数规划公式,它是两个现有整数规划公式的混合体,继承了两者的许多优点。许多对于指数规模公式来说过于复杂的问题,在我们的新公式中仍然可以在合理的时间内得到解决。我们对这些整数规划公式在模拟和真实基因型序列上进行了详细的实证比较。我们的公式还可以通过多种方式进行扩展,以允许输入中存在错误或对所考虑群体的结构进行建模。