Department of Statistics, University of Auckland, Auckland 1142, New Zealand.
Genetics. 2010 Aug;185(4):1337-44. doi: 10.1534/genetics.110.116681. Epub 2010 May 10.
We propose a multilocus version of F(ST) and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific F(ST) estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based F(ST) than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of F(ST) and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data.
我们提出了一种基于多位点的 F(ST)和基于局部单倍型簇的单倍型多样性度量方法。具体来说,我们使用了 BEAGLE 识别的单倍型簇,这是一个用于局部单倍型聚类的隐马尔可夫模型的程序,并执行了包括单倍型相位推断在内的多个功能。我们将这种方法应用于 HapMap 第三阶段的数据。通过这种单倍型聚类方法,非洲人群具有最高的多样性和与祖先群体最低的分歧,东亚人群具有最低的多样性和最高的分歧,而其他人群(欧洲、印度和墨西哥)则具有中等水平的多样性和分歧。这些关系与基于其他研究和人类历史公认模型的预期相符。相比之下,直接从单核苷酸多态性(SNP)获得的特定于群体的 F(ST)估计值并不反映这种预期关系。我们表明,SNP 的确认偏差对基于单倍型聚类的 F(ST)的影响小于基于 SNP 的版本,这为这些结果提供了一个潜在的解释。因此,这些新的 F(ST)和单倍型聚类多样性度量方法为基于高密度 SNP 数据的群体遗传分析提供了一个重要的新工具。