Liu Yunxian, Walavalkar Ninad M, Dozmorov Mikhail G, Rich Stephen S, Civelek Mete, Guertin Michael J
Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America.
Department of Biostatistics, Virginia Commonwealth University, Richmond, Virginia, United States of America.
PLoS Genet. 2017 Sep 28;13(9):e1006761. doi: 10.1371/journal.pgen.1006761. eCollection 2017 Sep.
Genome-wide association studies (GWAS) have discovered thousands loci associated with disease risk and quantitative traits, yet most of the variants responsible for risk remain uncharacterized. The majority of GWAS-identified loci are enriched for non-coding single-nucleotide polymorphisms (SNPs) and defining the molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter transcription factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed de novo motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. We identified candidate causal SNPs that are predicted to alter TF binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs. We confirm that the TFs bind with predicted allele-specific preferences using CTCF ChIP-seq data. We used The Cancer Genome Atlas breast cancer patient data to identify ANKLE1 and ZNF404 as the target genes of candidate TF binding site SNPs in the 19p13.11 and 19q13.31 GWAS-identified loci. These SNPs are associated with the expression of ZNF404 and ANKLE1 in breast tissue. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.
全基因组关联研究(GWAS)已经发现了数千个与疾病风险和数量性状相关的基因座,然而,大多数导致风险的变异仍未得到表征。大多数GWAS确定的基因座富含非编码单核苷酸多态性(SNP),确定风险的分子机制具有挑战性。许多非编码因果SNP被假设通过改变转录因子(TF)结合位点来影响生物体表型。我们采用综合基因组学方法来识别赋予GWAS确定的乳腺癌特异性表型的候选TF结合基序。我们对调控元件进行了从头基序分析,分析了已识别基序的进化保守性,并检测了TF足迹数据,以识别在乳腺癌相关组织和细胞系中招募TF并维持染色质景观的序列元件。我们确定了候选因果SNP,这些SNP预计会改变乳腺癌相关调控区域内的TF结合,这些区域与显著相关的GWAS SNP处于强连锁不平衡状态。我们使用CTCF ChIP-seq数据证实TF以预测的等位基因特异性偏好结合。我们使用癌症基因组图谱乳腺癌患者数据,将ANKLE1和ZNF404确定为19p13.11和19q13.31 GWAS确定的基因座中候选TF结合位点SNP的靶基因。这些SNP与乳腺组织中ZNF404和ANKLE1的表达相关。这种综合分析流程是一个通用框架,用于识别调控区域和TF结合位点内赋予表型变异和疾病风险的候选因果变异。