Manjunath Mohith, Yan Jialu, Youn Yeoan, Drucker Kristen L, Kollmeyer Thomas M, McKinney Andrew M, Zazubovich Valter, Zhang Yi, Costello Joseph F, Eckel-Passow Jeanette, Selvin Paul R, Jenkins Robert B, Song Jun S
Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Neuro Oncol. 2021 Apr 12;23(4):638-649. doi: 10.1093/neuonc/noaa248.
Large-scale genome-wide association studies (GWAS) have implicated thousands of germline genetic variants in modulating individuals' risk to various diseases, including cancer. At least 25 risk loci have been identified for low-grade gliomas (LGGs), but their molecular functions remain largely unknown.
We hypothesized that GWAS loci contain causal single nucleotide polymorphisms (SNPs) that reside in accessible open chromatin regions and modulate the expression of target genes by perturbing the binding affinity of transcription factors (TFs). We performed an integrative analysis of genomic and epigenomic data from The Cancer Genome Atlas and other public repositories to identify candidate causal SNPs within linkage disequilibrium blocks of LGG GWAS loci. We assessed their potential regulatory role via in silico TF binding sequence perturbations, convolutional neural network trained on TF binding data, and simulated annealing-based interpretation methods.
We built an interactive website (http://education.knoweng.org/alg3/) summarizing the functional footprinting of 280 variants in 25 LGG GWAS regions, providing rich information for further computational and experimental scrutiny. We identified as case studies PHLDB1 and SLC25A26 as candidate target genes of rs12803321 and rs11706832, respectively, and predicted the GWAS variant rs648044 to be the causal SNP modulating ZBTB16, a known tumor suppressor in multiple cancers. We showed that rs648044 likely perturbed the binding affinity of the TF MAFF, as supported by RNA interference and in vitro MAFF binding experiments.
The identified candidate (causal SNP, target gene, TF) triplets and the accompanying resource will help accelerate our understanding of the molecular mechanisms underlying genetic risk factors for gliomas.
大规模全基因组关联研究(GWAS)已发现数千种种系基因变异与个体患包括癌症在内的各种疾病的风险调节有关。至少已确定了25个低级别胶质瘤(LGG)的风险位点,但其分子功能仍大多未知。
我们假设GWAS位点包含位于可及开放染色质区域的因果单核苷酸多态性(SNP),并通过干扰转录因子(TF)的结合亲和力来调节靶基因的表达。我们对来自癌症基因组图谱和其他公共数据库的基因组和表观基因组数据进行了综合分析,以识别LGG GWAS位点连锁不平衡区域内的候选因果SNP。我们通过计算机模拟TF结合序列扰动、基于TF结合数据训练的卷积神经网络以及基于模拟退火的解释方法评估了它们的潜在调控作用。
我们建立了一个交互式网站(http://education.knoweng.org/alg3/),总结了25个LGG GWAS区域中280个变异的功能足迹,为进一步的计算和实验研究提供了丰富信息。我们将PHLDB1和SLC25A26作为案例研究,分别确定为rs12803321和rs11706832的候选靶基因,并预测GWAS变异rs648044是调节ZBTB16的因果SNP,ZBTB16是多种癌症中已知的肿瘤抑制因子。我们表明,如RNA干扰和体外MAFF结合实验所支持的,rs648044可能会干扰TF MAFF的结合亲和力。
所确定的候选(因果SNP、靶基因、TF)三联体及相关资源将有助于加快我们对胶质瘤遗传风险因素潜在分子机制的理解。