Gao Guimin, Pierce Brandon L, Olopade Olufunmilayo I, Im Hae Kyung, Huo Dezheng
Department of Public Health Sciences, University of Chicago, Chicago, United States of America.
Department of Human Genetics, University of Chicago, Chicago, United States of America.
PLoS Genet. 2017 Sep 28;13(9):e1006727. doi: 10.1371/journal.pgen.1006727. eCollection 2017 Sep.
Genome-wide association studies (GWAS) have identified more than 90 susceptibility loci for breast cancer, but the underlying biology of those associations needs to be further elucidated. More genetic factors for breast cancer are yet to be identified but sample size constraints preclude the identification of individual genetic variants with weak effects using traditional GWAS methods. To address this challenge, we utilized a gene-level expression-based method, implemented in the MetaXcan software, to predict gene expression levels for 11,536 genes using expression quantitative trait loci and examine the genetically-predicted expression of specific genes for association with overall breast cancer risk and estrogen receptor (ER)-negative breast cancer risk. Using GWAS datasets from a Challenge launched by National Cancer Institute, we identified TP53INP2 (tumor protein p53-inducible nuclear protein 2) at 20q11.22 to be significantly associated with ER-negative breast cancer (Z = -5.013, p = 5.35×10-7, Bonferroni threshold = 4.33×10-6). The association was consistent across four GWAS datasets, representing European, African and Asian ancestry populations. There are 6 single nucleotide polymorphisms (SNPs) included in the prediction of TP53INP2 expression and five of them were associated with estrogen-receptor negative breast cancer, although none of the SNP-level associations reached genome-wide significance. We conducted a replication study using a dataset outside of the Challenge, and found the association between TP53INP2 and ER-negative breast cancer was significant (p = 5.07x10-3). Expression of HP (16q22.2) showed a suggestive association with ER-negative breast cancer in the discovery phase (Z = 4.30, p = 1.70x10-5) although the association was not significant after Bonferroni adjustment. Of the 249 genes that are 250 kb within known breast cancer susceptibility loci identified from previous GWAS, 20 genes (8.0%) were statistically significant associated with ER-negative breast cancer (p<0.05), compared to 582 (5.2%) of 11,287 genes that are not close to previous GWAS loci. This study demonstrated that expression-based gene mapping is a promising approach for identifying cancer susceptibility genes.
全基因组关联研究(GWAS)已确定了90多个乳腺癌易感基因座,但其关联背后的生物学机制仍需进一步阐明。更多的乳腺癌遗传因素有待确定,但样本量的限制使得使用传统GWAS方法难以识别效应较弱的个体遗传变异。为应对这一挑战,我们利用MetaXcan软件中基于基因水平表达的方法,使用表达数量性状位点预测11536个基因的表达水平,并研究特定基因的遗传预测表达与总体乳腺癌风险和雌激素受体(ER)阴性乳腺癌风险的关联。利用美国国立癌症研究所发起的一项挑战中的GWAS数据集,我们发现位于20q11.22的TP53INP2(肿瘤蛋白p53诱导核蛋白2)与ER阴性乳腺癌显著相关(Z = -5.013,p = 5.35×10-7,Bonferroni阈值 = 4.33×10-6)。这种关联在代表欧洲、非洲和亚洲血统人群的四个GWAS数据集中是一致的。在TP53INP2表达预测中包含6个单核苷酸多态性(SNP),其中5个与雌激素受体阴性乳腺癌相关,尽管没有一个SNP水平的关联达到全基因组显著性。我们使用挑战之外的数据集进行了一项重复研究,发现TP53INP2与ER阴性乳腺癌之间的关联是显著的(p = 5.07x10-3)。在发现阶段,HP(16q22.2)的表达与ER阴性乳腺癌显示出提示性关联(Z = 4.30,p = 1.70x10-5),尽管在Bonferroni校正后该关联不显著。在先前GWAS确定的已知乳腺癌易感基因座内250 kb范围内的249个基因中,有20个基因(8.0%)与ER阴性乳腺癌有统计学显著关联(p<0.05),相比之下,在不靠近先前GWAS基因座的11287个基因中有582个(5.2%)有这种关联。这项研究表明,基于表达的基因定位是识别癌症易感基因的一种有前景方法。