Zollner Sebastian, Pritchard Jonathan K
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Am J Hum Genet. 2007 Apr;80(4):605-15. doi: 10.1086/512821. Epub 2007 Feb 16.
Genomewide association studies are now a widely used approach in the search for loci that affect complex traits. After detection of significant association, estimates of penetrance and allele-frequency parameters for the associated variant indicate the importance of that variant and facilitate the planning of replication studies. However, when these estimates are based on the original data used to detect the variant, the results are affected by an ascertainment bias known as the "winner's curse." The actual genetic effect is typically smaller than its estimate. This overestimation of the genetic effect may cause replication studies to fail because the necessary sample size is underestimated. Here, we present an approach that corrects for the ascertainment bias and generates an estimate of the frequency of a variant and its penetrance parameters. The method produces a point estimate and confidence region for the parameter estimates. We study the performance of this method using simulated data sets and show that it is possible to greatly reduce the bias in the parameter estimates, even when the original association study had low power. The uncertainty of the estimate decreases with increasing sample size, independent of the power of the original test for association. Finally, we show that application of the method to case-control data can improve the design of replication studies considerably.
全基因组关联研究如今是寻找影响复杂性状基因座的一种广泛使用的方法。在检测到显著关联后,对相关变异的外显率和等位基因频率参数的估计表明了该变异的重要性,并有助于复制研究的规划。然而,当这些估计基于用于检测该变异的原始数据时,结果会受到一种称为“胜者诅咒”的确定偏倚的影响。实际的遗传效应通常小于其估计值。这种对遗传效应的高估可能会导致复制研究失败,因为所需样本量被低估了。在此,我们提出一种方法,该方法可校正确定偏倚,并生成变异频率及其外显率参数的估计值。该方法会产生参数估计值的点估计和置信区间。我们使用模拟数据集研究了该方法的性能,并表明即使原始关联研究的效能较低,也有可能大幅降低参数估计中的偏倚。估计值的不确定性会随着样本量的增加而降低,与原始关联检验的效能无关。最后,我们表明将该方法应用于病例对照数据可显著改善复制研究的设计。