Ramírez-Soriano Anna, Nielsen Rasmus
Department of Biology, University of Copenhagen, 2100 Kbh Ø, Copenhagen, Denmark.
Genetics. 2009 Feb;181(2):701-10. doi: 10.1534/genetics.108.094060. Epub 2008 Dec 15.
Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter = 4N(e)mu (N(e), effective population size; mu, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's D statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.
大多数单核苷酸多态性(SNP)数据都存在由SNP发现过程随后,随后SNP基因分型所导致的确认偏倚。与直接测序数据相比,最终的基因分型数据偏向于常见等位基因过多,使得标准的遗传分析方法不适用于这类数据。我们在此基于成对差异的平均数和基于分离位点的数量,推导出基本群体遗传参数 = 4N(e)μ(N(e),有效群体大小;μ,突变率)的校正估计值。我们还推导了这些估计值的方差和协方差,并提供了Tajima's D统计量的校正版本。我们重新分析了一个人类全基因组SNP数据集,发现校正或未校正确认偏倚时结果存在显著差异。