Steffens Michael, Lamina Claudia, Illig Thomas, Bettecken Thomas, Vogler Rainer, Entz Patricia, Suk Eun-Kyung, Toliat Mohammad Reza, Klopp Norman, Caliebe Amke, König Inke R, Köhler Karola, Ludemann Jan, Diaz Lacava Amalia, Fimmers Rolf, Lichtner Peter, Ziegler Andreas, Wolf Andreas, Krawczak Michael, Nūrnberg Peter, Hampe Jochen, Schreiber Stefan, Meitinger Thomas, Wichmann H-Erich, Roeder Kathryn, Wienker Thomas F, Baur Max P
Institute of Medical Biometry, Informatics and Epidemiology, Rheinische Friedrich-Wilhelms-University, Bonn, Germany.
Hum Hered. 2006;62(1):20-9. doi: 10.1159/000095850. Epub 2006 Sep 21.
To evaluate the relevance and necessity to account for the effects of population substructure on association studies under a case-control design in central Europe, we analysed three samples drawn from different geographic areas of Germany. Two of the three samples, POPGEN (n = 720) and SHIP (n = 709), are from north and north-east Germany, respectively, and one sample, KORA (n = 730), is from southern Germany.
Population genetic differentiation was measured by classical F-statistics for different marker sets, either consisting of genome-wide selected coding SNPs located in functional genes, or consisting of selectively neutral SNPs from 'genomic deserts'. Quantitative estimates of the degree of stratification were performed comparing the genomic control approach [Devlin B, Roeder K: Biometrics 1999;55:997-1004], structured association [Pritchard JK, Stephens M, Donnelly P: Genetics 2000;155:945-959] and sophisticated methods like random forests [Breiman L: Machine Learning 2001;45:5-32].
F-statistics showed that there exists a low genetic differentiation between the samples along a north-south gradient within Germany (F(ST)(KORA/POPGEN): 1.7 . 10(-4); F(ST)(KORA/SHIP): 5.4 . 10(-4); F(ST)(POPGEN/SHIP): -1.3 . 10(-5)).
Although the F(ST )-values are very small, indicating a minor degree of population structure, and are too low to be detectable from methods without using prior information of subpopulation membership, such as STRUCTURE [Pritchard JK, Stephens M, Donnelly P: Genetics 2000;155:945-959], they may be a possible source for confounding due to population stratification.
为评估在中欧病例对照设计下考虑群体亚结构对关联研究影响的相关性和必要性,我们分析了从德国不同地理区域抽取的三个样本。三个样本中的两个,即POPGEN(n = 720)和SHIP(n = 709),分别来自德国北部和东北部,另一个样本KORA(n = 730)来自德国南部。
通过经典F统计量测量不同标记集的群体遗传分化,标记集要么由位于功能基因中的全基因组选择编码单核苷酸多态性(SNP)组成,要么由来自“基因组沙漠”的选择性中性SNP组成。通过比较基因组对照方法[Devlin B,Roeder K:生物统计学,1999年;55:997 - 1004]、结构化关联[Pritchard JK,Stephens M,Donnelly P:遗传学,2000年;155:945 - 959]以及随机森林等复杂方法[Breiman L:机器学习,2001年;45:5 - 32]对分层程度进行定量估计。
F统计量表明,德国境内沿南北梯度的样本间存在低水平的遗传分化(F(ST)(KORA/POPGEN):1.7×10(-4);F(ST)(KORA/SHIP):5.4×10(-4);F(ST)(POPGEN/SHIP): - 1.3×10(-5))。
尽管F(ST)值非常小,表明群体结构程度较轻,且低到无法从不使用亚群体成员先验信息的方法(如STRUCTURE [Pritchard JK,Stephens M,Donnelly P:遗传学,2000年;155:945 - 959])中检测到,但它们可能是群体分层导致混杂的一个潜在来源。