Lindström Sara, Loomis Stephanie, Turman Constance, Huang Hongyan, Huang Jinyan, Aschard Hugues, Chan Andrew T, Choi Hyon, Cornelis Marilyn, Curhan Gary, De Vivo Immaculata, Eliassen A Heather, Fuchs Charles, Gaziano Michael, Hankinson Susan E, Hu Frank, Jensen Majken, Kang Jae H, Kabrhel Christopher, Liang Liming, Pasquale Louis R, Rimm Eric, Stampfer Meir J, Tamimi Rulla M, Tworoger Shelley S, Wiggs Janey L, Hunter David J, Kraft Peter
Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States of America.
PLoS One. 2017 Mar 16;12(3):e0173997. doi: 10.1371/journal.pone.0173997. eCollection 2017.
The Nurses' Health Study (NHS), Nurses' Health Study II (NHSII), Health Professionals Follow Up Study (HPFS) and the Physicians Health Study (PHS) have collected detailed longitudinal data on multiple exposures and traits for approximately 310,000 study participants over the last 35 years. Over 160,000 study participants across the cohorts have donated a DNA sample and to date, 20,691 subjects have been genotyped as part of genome-wide association studies (GWAS) of twelve primary outcomes. However, these studies utilized six different GWAS arrays making it difficult to conduct analyses of secondary phenotypes or share controls across studies. To allow for secondary analyses of these data, we have created three new datasets merged by platform family and performed imputation using a common reference panel, the 1,000 Genomes Phase I release. Here, we describe the methodology behind the data merging and imputation and present imputation quality statistics and association results from two GWAS of secondary phenotypes (body mass index (BMI) and venous thromboembolism (VTE)). We observed the strongest BMI association for the FTO SNP rs55872725 (β = 0.45, p = 3.48x10-22), and using a significance level of p = 0.05, we replicated 19 out of 32 known BMI SNPs. For VTE, we observed the strongest association for the rs2040445 SNP (OR = 2.17, 95% CI: 1.79-2.63, p = 2.70x10-15), located downstream of F5 and also observed significant associations for the known ABO and F11 regions. This pooled resource can be used to maximize power in GWAS of phenotypes collected across the cohorts and for studying gene-environment interactions as well as rare phenotypes and genotypes.
护士健康研究(NHS)、护士健康研究II(NHSII)、卫生专业人员随访研究(HPFS)和医师健康研究(PHS)在过去35年中收集了约310,000名研究参与者关于多种暴露因素和特征的详细纵向数据。各队列中超过160,000名研究参与者捐赠了DNA样本,截至目前,作为12种主要结局的全基因组关联研究(GWAS)的一部分,已有20,691名受试者进行了基因分型。然而,这些研究使用了六种不同的GWAS芯片,使得难以对次要表型进行分析或在不同研究间共享对照。为了对这些数据进行二次分析,我们创建了三个按平台家族合并的新数据集,并使用通用参考面板(千人基因组计划第一阶段发布数据)进行了填补。在此,我们描述了数据合并和填补背后的方法,并展示了填补质量统计数据以及来自两个次要表型(体重指数(BMI)和静脉血栓栓塞症(VTE))GWAS的关联结果。我们观察到FTO单核苷酸多态性(SNP)rs55872725与BMI的关联最强(β = 0.45,p = 3.48×10 -22),使用p = 0.05的显著性水平,我们在32个已知的BMI SNP中重复验证了19个。对于VTE,我们观察到rs2040445 SNP的关联最强(比值比(OR)= 2.17,95%置信区间(CI):1.79 - 2.63,p = 2.70×10 -15),该SNP位于F5下游,并且还观察到已知的ABO和F11区域存在显著关联。这个合并后的资源可用于在各队列收集的表型GWAS中最大化检验效能,并用于研究基因 - 环境相互作用以及罕见表型和基因型。