Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA.
Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.
Br J Cancer. 2024 Dec;131(11):1796-1804. doi: 10.1038/s41416-024-02879-1. Epub 2024 Oct 28.
Genome-wide association studies (GWAS) have identified more than 200 breast cancer risk-associated genetic loci, yet the causal genes and biological mechanisms for most loci remain elusive. Proteins, as final gene products, are pivotal in cellular function. In this study, we conducted a proteome-wide association study (PWAS) to identify proteins in breast tissue related to breast cancer risk.
We profiled the proteome in fresh frozen breast tissue samples from 120 cancer-free European-ancestry women from the Susan G. Komen Tissue Bank (KTB). Protein expression levels were log2-transformed then normalized via quantile and inverse-rank transformations. GWAS data were also generated for these 120 samples. These data were used to build statistical models to predict protein expression levels via cis-genetic variants using the elastic net method. The prediction models were then applied to the GWAS summary statistics data of 133,384 breast cancer cases and 113,789 controls to assess the associations of genetically predicted protein expression levels with breast cancer risk overall and its subtypes using the S-PrediXcan method.
A total of 6388 proteins were detected in the normal breast tissue samples from 120 women with a high detection false discovery rate (FDR) p value < 0.01. Among the 5820 proteins detected in more than 80% of participants, prediction models were successfully built for 2060 proteins with R > 0.1 and P < 0.05. Among these 2060 proteins, five proteins were significantly associated with overall breast cancer risk at an FDR p value < 0.1. Among these five proteins, the corresponding genes for proteins COPG1, DCTN3, and DDX6 were located at least 1 Megabase away from the GWAS-identified breast cancer risk variants. COPG1 was associated with an increased risk of breast cancer with a p value of 8.54 × 10. Both DCTN3 and DDX6 were associated with a decreased risk of breast cancer with p values of 1.01 × 10 and 3.25 × 10, respectively. The corresponding genes for the remaining two proteins, LSP1 and DNAJA3, were located in previously GWAS-identified breast cancer risk loci. After adjusting for GWAS-identified risk variants, the association for DNAJA3 was still significant (p value of 9.15 × 10 and adjusted p value of 1.94 × 10). However, the significance for LSP1 became weaker with a p value of 0.62. Stratification analyses by breast cancer subtypes identified three proteins, SMARCC1, LSP1, and NCKAP1L, associated with luminal A, luminal B, and ER-positive breast cancer. NCKAP1L was located at least 1Mb away from the GWAS-identified breast cancer risk variants. After adjusting for GWAS-identified breast cancer risk variants, the association for protein LSP1 was still significant (adjusted p value of 6.43 × 10 for luminal B subtype).
We conducted the first breast-tissue-based PWAS and identified seven proteins associated with breast cancer, including five proteins not previously implicated. These findings help improve our understanding of the underlying genetic mechanism of breast cancer development.
全基因组关联研究(GWAS)已经确定了 200 多个与乳腺癌风险相关的遗传位点,但大多数位点的因果基因和生物学机制仍然难以捉摸。蛋白质作为最终的基因产物,在细胞功能中起着至关重要的作用。在这项研究中,我们进行了一项全蛋白质组关联研究(PWAS),以鉴定与乳腺癌风险相关的乳腺组织中的蛋白质。
我们对来自 120 名无癌症欧洲血统的苏珊·科曼组织银行(KTB)女性的新鲜冷冻乳腺组织样本进行了蛋白质组分析。蛋白质表达水平经过对数转换,然后通过分位数和逆秩变换进行归一化。还为这 120 个样本生成了 GWAS 数据。这些数据用于通过弹性网络方法构建统计模型,使用顺式遗传变异来预测蛋白质表达水平。然后,将预测模型应用于 133384 例乳腺癌病例和 113789 例对照的 GWAS 汇总统计数据,使用 S-PrediXcan 方法评估遗传预测的蛋白质表达水平与整体乳腺癌风险及其亚型的相关性。
在 120 名女性的正常乳腺组织样本中,共检测到 6388 种蛋白质,具有高检测假发现率(FDR)p 值<0.01。在 80%以上参与者中检测到的 5820 种蛋白质中,成功为 2060 种蛋白质建立了预测模型,R>0.1,P<0.05。在这 2060 种蛋白质中,有 5 种蛋白质与整体乳腺癌风险显著相关,FDR p 值<0.1。在这 5 种蛋白质中,COPG1、DCTN3 和 DDX6 相应基因与 GWAS 确定的乳腺癌风险变异至少相隔 1 兆碱基。COPG1 与乳腺癌风险增加相关,p 值为 8.54×10。DCTN3 和 DDX6 与乳腺癌风险降低相关,p 值分别为 1.01×10 和 3.25×10。另外两种蛋白质 LSP1 和 DNAJA3 的相应基因位于之前 GWAS 确定的乳腺癌风险位点。在调整了 GWAS 确定的风险变异后,DNAJA3 的相关性仍然显著(p 值为 9.15×10 和调整后的 p 值为 1.94×10)。然而,LSP1 的显著性变弱,p 值为 0.62。按乳腺癌亚型进行分层分析,确定了三种蛋白质,SMARCC1、LSP1 和 NCKAP1L,与 luminal A、luminal B 和 ER 阳性乳腺癌相关。NCKAP1L 与 GWAS 确定的乳腺癌风险变异至少相隔 1Mb。在调整了 GWAS 确定的乳腺癌风险变异后,LSP1 蛋白的相关性仍然显著(luminal B 亚型的调整 p 值为 6.43×10)。
我们进行了首次基于乳腺组织的 PWAS,并鉴定了 7 种与乳腺癌相关的蛋白质,其中包括 5 种以前未涉及的蛋白质。这些发现有助于我们更好地理解乳腺癌发生的潜在遗传机制。