Wang Wei-Bung, Jiang Tao
Computer Science, University of California - Riverside, 900 University Avenue, Riverside, California 92521, USA.
J Bioinform Comput Biol. 2011 Apr;9(2):339-65. doi: 10.1142/s0219720011005549.
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology.
从家系成员的基因型推断单倍型已经得到了广泛研究。然而,大多数研究并未考虑基因分型错误和新生突变。在本文中,我们研究如何从可能包含基因分型错误、新生突变和缺失等位基因的基因型数据中推断单倍型。我们假设基因型数据中不存在重组,对于紧密连锁的标记来说通常如此。我们引入了一个组合优化问题,称为带有突变和错误的单倍型配置(HCME),它要求与给定基因型一致的单倍型配置,不产生重组,并且需要最少的突变和错误数量。HCME是NP难问题。为了解决这个问题,我们提出了一种启发式算法,其核心是一个使用伽罗瓦域GF(2)上的线性方程组的整数线性规划(ILP)。我们的算法能够检测和定位那些无法通过简单检查孟德尔遗传定律检测到的基因分型错误。该算法还能对基因型/单倍型进行纠错,而不仅仅是检测不一致并删除相关位点。我们的实验结果表明,该算法能够以非常高的准确率推断单倍型,并且根据家系拓扑结构能够恢复65% - 94%的基因分型错误。