Yu Kai, Wang Zhaoming, Li Qizhai, Wacholder Sholom, Hunter David J, Hoover Robert N, Chanock Stephen, Thomas Gilles
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.
PLoS One. 2008 Jul 2;3(7):e2551. doi: 10.1371/journal.pone.0002551.
Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor lambda of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (lambda of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r(2)<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to lambda of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.
确定严格的经典流行病学标准对于对照选择的相关性以及对群体分层(PS)进行稳健处理,是全基因组关联研究(GWAS)设计和分析中的一项重大挑战。癌症易感性基因标记(CGEMS)项目中两项针对欧裔美国人的GWAS的经验数据,被用于评估PS在采用不同对照选择策略的研究中的影响。在嵌套于相应前瞻性队列的两项原始病例对照研究中,均观察到因PS产生的轻微混杂效应(膨胀因子λ分别为1.025和1.005)。相比之下,当交换对照组以模拟一种具有成本效益但理论上不太理想的对照选择策略时,混杂效应更大(λ分别为1.090和1.062)。选择了一组在Illumina和Affymetrix商业平台上均存在的、具有低局部背景连锁不平衡(成对r²<0.004)的12,898个常染色体单核苷酸多态性(SNP),通过主成分分析来推断群体亚结构。开发了一种新的置换程序用于校正PS,该程序识别出一组较小的主成分,并且比目前使用的方法能更好地控制I型错误(分别将λ控制到1.032和1.006)。基于新检验和未进行PS校正的检验,p值最低的5%的SNP集合之间的重叠率约为80%,大多数不一致的SNP的排名都接近阈值。因此,对于在欧裔美国人中进行的前列腺癌和乳腺癌的CGEMS GWAS,在设计良好的研究中,PS似乎不是一个主要问题。当采用有效的PS校正策略时,使用次优对照的研究可以具有可接受的I型错误。