Hoffmann Thomas J, Sakoda Lori C, Shen Ling, Jorgenson Eric, Habel Laurel A, Liu Jinghua, Kvale Mark N, Asgari Maryam M, Banda Yambazi, Corley Douglas, Kushi Lawrence H, Quesenberry Charles P, Schaefer Catherine, Van Den Eeden Stephen K, Risch Neil, Witte John S
Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America; Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America.
Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America.
PLoS Genet. 2015 Jan 28;11(1):e1004930. doi: 10.1371/journal.pgen.1004930. eCollection 2015 Jan.
An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects. [corrected].
一种有效描述罕见基因变异疾病负担的方法是,利用大型测序参考面板,将这些变异推算到具有现有全基因组基因型数据且表型良好的大型队列中。这种方法的成功取决于罕见变异推算的准确性,而这一点仍存在争议。例如,最近一项研究表明,无法充分推算与前列腺癌风险相关的HOXB13 G84E突变(在千人基因组计划中欧洲血统参与者中的携带频率为0.0034)。我们表明,通过利用千人基因组计划数据以及一个富集的突变携带者参考面板,我们能够将G84E突变准确推算到来自凯撒永久医疗集团基因、环境与健康成人健康与衰老队列遗传流行病学研究的83285名非西班牙裔白人参与者的大型队列中。通过一种新颖的分类与回归树方法确认了推算的真实性,然后通过分析这些受试者的一个子集以及另外1789名来自凯撒永久医疗集团的专门针对G84E突变进行基因分型的男性进行了实证验证(r2 = 0.57,95%置信区间 = 0.37–0.77)。然后,我们通过使用推算数据来研究G84E突变对该队列中特定年龄前列腺癌风险以及其他14种癌症风险的影响,展示了这种方法的价值。G84E突变携带者中前列腺癌的特定年龄风险高于非携带者。对于G84E突变携带者和非携带者,到72岁时,来自卡普兰 - 迈耶曲线的风险估计分别为36.7%和13.6%,到80岁时分别为64.2%和24.2%(p = 3.4×10−12)。G84E突变还与其他14种最常见癌症的综合风险增加相关(p = 5.8×10−4),在诊断为多种癌症类型的病例中更是如此,包括和不包括前列腺癌的情况,强烈提示存在多效性效应。[已修正]