Department of Biostatistics, Copenhagen University, Copenhagen, Denmark.
Mol Biol Evol. 2010 Nov;27(11):2534-47. doi: 10.1093/molbev/msq148. Epub 2010 Jun 17.
Chip-based high-throughput genotyping has facilitated genome-wide studies of genetic diversity. Many studies have utilized these large data sets to make inferences about the demographic history of human populations using measures of genetic differentiation such as F(ST) or principal component analyses. However, the single nucleotide polymorphism (SNP) chip data suffer from ascertainment biases caused by the SNP discovery process in which a small number of individuals from selected populations are used as discovery panels. In this study, we investigate the effect of the ascertainment bias on inferences regarding genetic differentiation among populations in one of the common genome-wide genotyping platforms. We generate SNP genotyping data for individuals that previously have been subject to partial genome-wide Sanger sequencing and compare inferences based on genotyping data to inferences based on direct sequencing. In addition, we also analyze publicly available genome-wide data. We demonstrate that the ascertainment biases will distort measures of human diversity and possibly change conclusions drawn from these measures in some times unexpected ways. We also show that details of the genotyping calling algorithms can have a surprisingly large effect on population genetic inferences. We not only present a correction of the spectrum for the widely used Affymetrix SNP chips but also show that such corrections are difficult to generalize among studies.
基于芯片的高通量基因分型促进了全基因组范围内遗传多样性的研究。许多研究利用这些大型数据集,通过衡量遗传分化的指标,如 F(ST)或主成分分析,来推断人类群体的人口历史。然而,单核苷酸多态性 (SNP) 芯片数据存在由 SNP 发现过程引起的确定偏差,在该过程中,从选定的人群中选择少数个体作为发现面板。在这项研究中,我们研究了确定偏差对一种常见全基因组基因分型平台中群体间遗传分化推断的影响。我们为先前进行过部分全基因组 Sanger 测序的个体生成 SNP 基因分型数据,并比较基于基因分型数据的推断和基于直接测序的推断。此外,我们还分析了公开可用的全基因组数据。我们证明确定偏差会扭曲人类多样性的衡量标准,并可能以一些意想不到的方式改变从这些衡量标准中得出的结论。我们还表明,基因分型调用算法的细节会对群体遗传推断产生惊人的影响。我们不仅提出了一种广泛使用的 Affymetrix SNP 芯片的校正方法,还表明这种校正方法很难在不同的研究中推广。