Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA, USA.
Bioessays. 2013 Sep;35(9):780-6. doi: 10.1002/bies.201300014. Epub 2013 Jul 9.
Whole genome sequencing and SNP genotyping arrays can paint strikingly different pictures of demographic history and natural selection. This is because genotyping arrays contain biased sets of pre-ascertained SNPs. In this short review, we use comparisons between high-coverage whole genome sequences of African hunter-gatherers and data from genotyping arrays to highlight how SNP ascertainment bias distorts population genetic inferences. Sample sizes and the populations in which SNPs are discovered affect the characteristics of observed variants. We find that SNPs on genotyping arrays tend to be older and present in multiple populations. In addition, genotyping arrays cause allele frequency distributions to be shifted towards intermediate frequency alleles, and estimates of linkage disequilibrium are modified. Since population genetic analyses depend on allele frequencies, it is imperative that researchers are aware of the effects of SNP ascertainment bias. With this in mind, we describe multiple ways to correct for SNP ascertainment bias.
全基因组测序和 SNP 基因分型芯片可以描绘出截然不同的人口历史和自然选择图景。这是因为基因分型芯片包含有偏倚的预先确定的 SNP 集。在这篇简短的综述中,我们使用非洲狩猎采集者的高覆盖率全基因组序列与基因分型芯片数据之间的比较,强调了 SNP 确定偏差如何扭曲群体遗传推断。样本量和发现 SNPs 的人群会影响观察到的变异的特征。我们发现,基因分型芯片上的 SNPs 往往更古老,存在于多个群体中。此外,基因分型芯片导致等位基因频率分布向中间频率等位基因转移,并且连锁不平衡的估计值也发生了改变。由于群体遗传分析依赖于等位基因频率,因此研究人员必须意识到 SNP 确定偏差的影响。考虑到这一点,我们描述了多种校正 SNP 确定偏差的方法。