Department of Computer Engineering and Information Technology, Isfahan University of Technology, Isfahan 84156-83111, Iran.
J Theor Biol. 2012 Apr 7;298:122-30. doi: 10.1016/j.jtbi.2012.01.003. Epub 2012 Jan 12.
Given a set of aligned fragments, haplotype assembly is the problem of finding the haplotypes from which the fragments have been read. The problem is important because haplotypes contain SNP information, which is essential to many genomic analyses such as the analysis of potential association between certain diseases and genetic variations. The current state-of-the-art haplotype assembly algorithm, HapSAT, does not exploit genotype information and only receives a read matrix as input. However, the imminent importance of haplotypes and inexpensiveness of genotype information motivate for exploiting genotype information to obtain more accurate haplotypes. In this paper, an improved haplotype assembly method, xGenHapSAT, is proposed, which exploits xor genotype information for more accurate haplotype assembly. Xor genotype information is even less expensive than full genotype information, e.g., using the Denaturing High-Performance Liquid Chromatography (DHPLC) technique. It is shown that using this inexpensively obtainable information significantly improves the accuracy of the assembled haplotypes. In addition, a new, more efficient, Max-2-SAT formulation is adopted in xGenHapSAT, which, on average, increases the speed of the algorithm. Moreover, the proposed xGenHapSAT method replaces the current state-of-the-art haplotype assembly method based on genotype information. Finally, our state-of-the-art haplotype assembly software, HapSoft, which includes both xGenHapSAT and HapSAT, is made freely available for research purposes.
给定一组对齐的片段,单倍型组装就是从这些片段中找到单倍型的问题。这个问题很重要,因为单倍型包含 SNP 信息,这对于许多基因组分析是必不可少的,如某些疾病和遗传变异之间潜在关联的分析。当前最先进的单倍型组装算法 HapSAT 没有利用基因型信息,只接收一个读取矩阵作为输入。然而,单倍型的迫切重要性和基因型信息的低廉价格促使我们利用基因型信息来获得更准确的单倍型。在本文中,提出了一种改进的单倍型组装方法 xGenHapSAT,该方法利用异或基因型信息进行更准确的单倍型组装。异或基因型信息甚至比全基因型信息更便宜,例如使用变性高效液相色谱(DHPLC)技术。结果表明,利用这种廉价可得的信息可以显著提高组装单倍型的准确性。此外,在 xGenHapSAT 中采用了一种新的、更有效的 Max-2-SAT 公式,这平均提高了算法的速度。此外,所提出的 xGenHapSAT 方法取代了基于基因型信息的当前最先进的单倍型组装方法。最后,我们的最先进的单倍型组装软件 HapSoft,其中包括 xGenHapSAT 和 HapSAT,为研究目的免费提供。