Norwegian University of Life Sciences, 1430 As, Norway.
Genetics. 2010 Aug;185(4):1441-9. doi: 10.1534/genetics.110.113936. Epub 2010 May 17.
A novel method, called linkage disequilibrium multilocus iterative peeling (LDMIP), for the imputation of phase and missing genotypes is developed. LDMIP performs an iterative peeling step for every locus, which accounts for the family data, and uses a forward-backward algorithm to accumulate information across loci. Marker similarity between haplotype pairs is used to impute possible missing genotypes and phases, which relies on the linkage disequilibrium between closely linked markers. After this imputation step, the combined iterative peeling/forward-backward algorithm is applied again, until convergence. The calculations per iteration scale linearly with number of markers and number of individuals in the pedigree, which makes LDMIP well suited to large numbers of markers and/or large numbers of individuals. Per iteration calculations scale quadratically with the number of alleles, which implies biallelic markers are preferred. In a situation with up to 15% randomly missing genotypes, the error rate of the imputed genotypes was <1% and approximately 99% of the missing genotypes were imputed. In another example, LDMIP was used to impute whole-genome sequence data consisting of 17,321 SNPs on a chromosome. Imputation of the sequence was based on the information of 20 (re)sequenced founder individuals and genotyping their descendants for a panel of 3000 SNPs. The error rate of the imputed SNP genotypes was 10%. However, if the parents of these 20 founders are also sequenced, >99% of missing genotypes are imputed correctly.
提出了一种新的方法,称为连锁不平衡多位点迭代剥蚀(LDMIP),用于相位和缺失基因型的推断。LDMIP 对每个位点执行迭代剥蚀步骤,该步骤考虑了家族数据,并使用前向-后向算法在多个位点之间累积信息。使用单倍型对之间的标记相似性来推断可能缺失的基因型和相位,这依赖于紧密连锁标记之间的连锁不平衡。在这个推断步骤之后,再次应用组合的迭代剥蚀/前向-后向算法,直到收敛。每次迭代的计算与标记的数量和家系中个体的数量呈线性关系,这使得 LDMIP 非常适合于大量的标记和/或大量的个体。每次迭代的计算与等位基因的数量呈二次关系,这意味着双等位基因标记是首选的。在高达 15%随机缺失基因型的情况下,推断基因型的错误率<1%,并且大约 99%的缺失基因型被推断。在另一个例子中,LDMIP 用于推断包含在一个染色体上的 17321 个 SNP 的全基因组序列数据。序列的推断是基于 20 个(重新)测序的创始人个体的信息,并对 3000 个 SNP 的面板对他们的后代进行基因分型。推断的 SNP 基因型的错误率为 10%。然而,如果这些 20 个创始人的父母也被测序,则可以正确推断出>99%的缺失基因型。