Brown Benjamin T, Woerner August, Wilder Jason A
Department of Biology, Williams College, Williamstown, MA 01267, USA.
J Mol Evol. 2007 Mar;64(3):375-85. doi: 10.1007/s00239-006-0149-0. Epub 2007 Jan 16.
Many East Asian human populations harbor a high-frequency deficiency allele for the aldehyde dehydrogenase 2 (ALDH2) enzyme, a critical protein involved in the metabolism of ethanol. Here we use resequencing and long-range SNP haplotype data from a Japanese sample to test whether patterns of nucleotide diversity and linkage disequilibrium at this locus are compatible with a standard neutral model of evolution. Examination of the pattern of polymorphism at a locus such as this, where the frequency of a common allele is known a priori, introduces an ascertainment bias that must be corrected for in analyses of the frequency spectrum of polymorphisms. We apply a flexible and generally applicable simulation approach to correct for this bias in our ALDH2 data and, also, to explore the effect of bias on the commonly used summary statistics Tajima's D, Fu and Li's D, and Fay and Wu's H. Our study finds no evidence that the pattern of genetic variation at ALDH2 differs from that expected under a standard neutral model. However, our general examination of ascertainment bias indicates that a priori knowledge of segregating alleles greatly affects the expected distributions of summary statistics. Under many parameter combinations we find that ascertainment bias introduces an elevated rate of false positives when summary statistics are used to test for deviations from a standard neutral model. However, we also show that over a wide range of conditions the power of all summary statistics can be greatly increased by incorporating prior knowledge of segregating alleles.
许多东亚人群中都存在乙醛脱氢酶2(ALDH2)的高频缺陷等位基因,该酶是参与乙醇代谢的关键蛋白质。在此,我们利用来自日本样本的重测序和长程单核苷酸多态性(SNP)单倍型数据,来检验该基因座处的核苷酸多样性和连锁不平衡模式是否符合标准的中性进化模型。对于这样一个已知常见等位基因频率的基因座,检查其多态性模式会引入一种确定偏差,这种偏差在多态性频率谱分析中必须予以校正。我们应用一种灵活且普遍适用的模拟方法来校正我们的ALDH2数据中的这种偏差,同时也探讨这种偏差对常用汇总统计量—— Tajima's D、Fu和Li's D以及Fay和Wu's H的影响。我们的研究没有发现证据表明ALDH2处的遗传变异模式与标准中性模型下预期的模式不同。然而,我们对确定偏差的总体研究表明,分离等位基因的先验知识会极大地影响汇总统计量的预期分布。在许多参数组合下,我们发现当使用汇总统计量来检验与标准中性模型的偏差时,确定偏差会导致假阳性率升高。不过,我们也表明,在广泛的条件下,通过纳入分离等位基因的先验知识,所有汇总统计量的功效都可以大大提高。