Lohmueller Kirk E, Bustamante Carlos D, Clark Andrew G
Department of Biostatistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
Genetics. 2009 May;182(1):217-31. doi: 10.1534/genetics.108.099275. Epub 2009 Mar 2.
We propose a novel approximate-likelihood method to fit demographic models to human genomewide single-nucleotide polymorphism (SNP) data. We divide the genome into windows of constant genetic map width and then tabulate the number of distinct haplotypes and the frequency of the most common haplotype for each window. We summarize the data by the genomewide joint distribution of these two statistics-termed the HCN statistic. Coalescent simulations are used to generate the expected HCN statistic for different demographic parameters. The HCN statistic provides additional information for disentangling complex demography beyond statistics based on single-SNP frequencies. Application of our method to simulated data shows it can reliably infer parameters from growth and bottleneck models, even in the presence of recombination hotspots when properly modeled. We also examined how practical problems with genomewide data sets, such as errors in the genetic map, haplotype phase uncertainty, and SNP ascertainment bias, affect our method. Several modifications of our method served to make it robust to these problems. We have applied our method to data collected by Perlegen Sciences and find evidence for a severe population size reduction in northwestern Europe starting 32,500-47,500 years ago.
我们提出了一种新颖的近似似然方法,用于将人口模型拟合到人类全基因组单核苷酸多态性(SNP)数据。我们将基因组划分为具有恒定遗传图谱宽度的窗口,然后统计每个窗口中不同单倍型的数量以及最常见单倍型的频率。我们通过这两个统计量的全基因组联合分布(称为HCN统计量)来汇总数据。使用溯祖模拟来生成不同人口参数下的预期HCN统计量。HCN统计量为解开基于单SNP频率的统计之外的复杂人口结构提供了额外信息。我们的方法在模拟数据上的应用表明,即使在存在重组热点且建模恰当的情况下,它也能可靠地从增长和瓶颈模型中推断参数。我们还研究了全基因组数据集的实际问题,如遗传图谱中的误差、单倍型相位不确定性和SNP确定偏差,如何影响我们的方法。我们对方法进行了若干修改,使其对这些问题具有鲁棒性。我们已将我们的方法应用于Perlegen Sciences收集的数据,并发现有证据表明,在32500 - 47500年前开始,欧洲西北部的人口规模出现了严重下降。