Marchini Jonathan, Cardon Lon R, Phillips Michael S, Donnelly Peter
Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX1 3TG, UK.
Nat Genet. 2004 May;36(5):512-7. doi: 10.1038/ng1337. Epub 2004 Mar 28.
Large-scale association studies hold substantial promise for unraveling the genetic basis of common human diseases. A well-known problem with such studies is the presence of undetected population structure, which can lead to both false positive results and failures to detect genuine associations. Here we examine approximately 15,000 genome-wide single-nucleotide polymorphisms typed in three population groups to assess the consequences of population structure on the coming generation of association studies. The consequences of population structure on association outcomes increase markedly with sample size. For the size of study needed to detect typical genetic effects in common diseases, even the modest levels of population structure within population groups cannot safely be ignored. We also examine one method for correcting for population structure (Genomic Control). Although it often performs well, it may not correct for structure if too few loci are used and may overcorrect in other settings, leading to substantial loss of power. The results of our analysis can guide the design of large-scale association studies.
大规模关联研究在揭示常见人类疾病的遗传基础方面具有巨大潜力。此类研究一个众所周知的问题是存在未被检测到的群体结构,这可能导致假阳性结果以及无法检测到真正的关联。在此,我们检测了三个群体中约15000个全基因组单核苷酸多态性,以评估群体结构对即将开展的关联研究的影响。群体结构对关联结果的影响会随着样本量的增加而显著增大。对于检测常见疾病中典型遗传效应所需的研究规模而言,即使群体内部适度的群体结构水平也不能被安全地忽略。我们还研究了一种校正群体结构的方法(基因组控制)。尽管它通常表现良好,但如果使用的基因座过少,可能无法校正结构,而在其他情况下可能会过度校正,导致效能大幅损失。我们的分析结果可为大规模关联研究的设计提供指导。