Chang Yung-Han, Head S Taylor, Harrison Tabitha, Yu Yao, Huff Chad D, Pasaniuc Bogdan, Lindström Sara, Bhattacharya Arjun
Quantitative Sciences Program, The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA.
Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
medRxiv. 2024 Oct 30:2024.10.29.24316388. doi: 10.1101/2024.10.29.24316388.
Integrating genome-wide association study (GWAS) and transcriptomic datasets can help identify potential mediators for germline genetic risk of cancer. However, traditional methods have been largely unsuccessful because of an overreliance on total gene expression. These approaches overlook alternative splicing, which can produce multiple isoforms from the same gene, each with potentially different effects on cancer risk. Here, we integrate genetic and multi-tissue isoform-level gene expression data from the Genotype Tissue-Expression Project (GTEx, N = 108-574) with publicly available European-ancestry GWAS summary statistics (all N > 20,000 cases) to identify both isoform- and gene-level risk associations with six cancers (breast, endometrial, colorectal, lung, ovarian, prostate) and six related cancer subtype classifications (N = 12 total). Compared to traditional methods leveraging total gene expression, directly modeling isoform expression through transcriptome-wide association studies (isoTWAS) substantially increases discovery of transcriptomic mechanisms underlying genetic associations. Using the same RNA-seq datasets, isoTWAS identified 164% more significant unique gene associations compared to TWAS (6,163 and 2,336, respectively), with isoTWAS-prioritized genes enriched 4-fold for evolutionarily-constrained genes (P = 6.1 × 10). isoTWAS tags transcriptomic associations at 52% more independent GWAS loci compared to TWAS across the six cancers. Additionally, isoform expression mediates an estimated 63% greater proportion of cancer risk SNP heritability compared to gene expression when evaluating cis-genetic influence on isoform expression. We highlight several notable isoTWAS associations that demonstrate GWAS colocalization at the isoform level but not at the gene level, including, (lung cancer), (colorectal), and (breast). These results underscore the critical importance of modeling isoform-level expression to maximize discovery of genetic risk mechanisms for cancers.
整合全基因组关联研究(GWAS)和转录组数据集有助于识别癌症种系遗传风险的潜在介导因素。然而,由于过度依赖总基因表达,传统方法在很大程度上并不成功。这些方法忽略了可变剪接,可变剪接可以从同一基因产生多种异构体,每种异构体对癌症风险可能有不同的影响。在这里,我们将基因型组织表达项目(GTEx,N = 108 - 574)的遗传和多组织异构体水平基因表达数据与公开可用的欧洲血统GWAS汇总统计数据(所有N > 20,000例)相结合,以识别与六种癌症(乳腺癌、子宫内膜癌、结直肠癌、肺癌、卵巢癌、前列腺癌)和六种相关癌症亚型分类(共N = 12)的异构体和基因水平风险关联。与利用总基因表达的传统方法相比,通过全转录组关联研究(isoTWAS)直接对异构体表达进行建模,大大增加了对遗传关联背后转录组机制的发现。使用相同的RNA测序数据集,与全转录组关联研究(TWAS)相比,isoTWAS识别出的显著独特基因关联多164%(分别为6,163和2,336个),isoTWAS优先考虑的基因在进化受限基因中富集了4倍(P = 6.1 × 10)。与六种癌症的TWAS相比,isoTWAS在多52%的独立GWAS位点标记转录组关联。此外,在评估顺式遗传对异构体表达的影响时,与基因表达相比异构体表达介导的癌症风险SNP遗传力估计比例高63%。我们强调了几个值得注意的isoTWAS关联,这些关联表明在异构体水平而非基因水平上GWAS共定位,包括(肺癌)、(结直肠癌)和(乳腺癌)。这些结果强调了对异构体水平表达进行建模对于最大限度发现癌症遗传风险机制的至关重要性。