Hoh J, Hodge S E
Laboratory of Statistical Genetics, The Rockefeller University, New York, NY 10021, USA.
Hum Hered. 2000 Nov-Dec;50(6):359-64. doi: 10.1159/000022941.
The extent of haplotype ambiguity in a string of single-nucleotide polymorphisms (SNPs) was quantified by Hodge et al. [Nat Genet 1999;21:360]. In their measure, the level of ambiguity increases with increasing numbers of loci and as loci become more polymorphic. That work assumed linkage equilibrium (LE). However, linkage disequilibrium (LD) provides additional information about the haplotypes at a site, thereby diluting the level of ambiguity. The ambiguity vanishes altogether when LD reaches its maximum value. Here, we introduce the ambiguity measure, Phi, to allow for LD (between pairs of SNPs). We derive the formula Phi = 4x(2)x(3) for ambiguity in individuals, where x(1), x(2), x(3) and x(4) are the probabilities of the A(1)A(2), A(1)B(2), B(1)A(2) and B(1)B(2) haplotypes, respectively, and w.l.o.g. x(1)x(4) > or = x(2)x(3). Alternatively, Phi can be expressed in terms of the allele frequencies and the LD parameter delta. We also extend the formula to triads of two parents plus one child. We estimate our measure Phi for relevant SNPs in the published lipoprotein lipase (LPL) gene dataset [Clark et al., Am J Hum Genet 1998;63:595; Nickerson et al., Nat Genet 1998;19:233], obtaining values ranging from a low of 0 to a high of 0.11 among adjacent pairs of sites. In genome-wide LD studies to map common disease genes, a dense map of SNPs may be utilized to detect association between a marker and disease. Therefore, the measurement of ambiguity can potentially help investigators to determine a more efficient map, designed to minimize ambiguity and subsequent information loss.
霍奇等人[《自然遗传学》1999年;21:360]对一串单核苷酸多态性(SNP)中单体型模糊性的程度进行了量化。在他们的测量方法中,模糊性水平随着位点数量的增加以及位点变得更加多态而增加。该研究假设处于连锁平衡(LE)状态。然而,连锁不平衡(LD)提供了关于某一位置单体型的额外信息,从而降低了模糊性水平。当LD达到其最大值时,模糊性完全消失。在此,我们引入模糊性度量Phi,以考虑(SNP对之间的)LD。我们推导出个体中模糊性的公式Phi = 4x(2)x(3),其中x(1)、x(2)、x(3)和x(4)分别是A(1)A(2)、A(1)B(2)、B(1)A(2)和B(1)B(2)单体型的概率,且不失一般性地设x(1)x(4)≥x(2)x(3)。或者,Phi可以用等位基因频率和LD参数delta来表示。我们还将该公式扩展到两个亲本加一个孩子的三联体。我们对已发表的脂蛋白脂肪酶(LPL)基因数据集[克拉克等人,《美国人类遗传学杂志》1998年;63:595;尼克森等人,《自然遗传学》1998年;19:233]中的相关SNP估计了我们的度量Phi值,相邻位点对之间的值范围从低0到高0.11。在全基因组LD研究中绘制常见疾病基因图谱时,可能会利用密集的SNP图谱来检测标记与疾病之间的关联。因此,模糊性的测量可能有助于研究人员确定一个更有效的图谱,旨在将模糊性及后续信息损失降至最低。