University of Rochester Medical Center, Rochester, NY, USA.
Physiol Genomics. 2011 Sep 22;43(18):1038-48. doi: 10.1152/physiolgenomics.00098.2011. Epub 2011 Jul 19.
Regulatory SNPs (rSNPs) reside primarily within the nonprotein coding genome and are thought to disturb normal patterns of gene expression by altering DNA binding of transcription factors. Nevertheless, despite the explosive rise in SNP association studies, there is little information as to the function of rSNPs in human disease. Serum response factor (SRF) is a widely expressed DNA-binding transcription factor that has variable affinity to at least 1,216 permutations of a 10 bp transcription factor binding site (TFBS) known as the CArG box. We developed a robust in silico bioinformatics screening method to evaluate sequences around RefSeq genes for conserved CArG boxes. Utilizing a predetermined phastCons threshold score, we identified 8,252 strand-specific CArGs within an 8 kb window around the transcription start site of 5,213 genes, including all previously defined SRF target genes. We then interrogated this CArG dataset for the presence of previously annotated common polymorphisms. We found a total of 118 unique CArG boxes harboring a SNP within the 10 bp CArG sequence and 1,130 CArG boxes with SNPs located just outside the CArG element. Gel shift and luciferase reporter assays validated SRF binding and functional activity of several new CArG boxes. Importantly, SNPs within or just outside the CArG box often resulted in altered SRF binding and activity. Collectively, these findings demonstrate a powerful approach to computationally define rSNPs in the human CArGome and provide a foundation for similar analyses of other TFBS. Such information may find utility in genetic association studies of human disease where little insight is known regarding the functionality of rSNPs.
调控单核苷酸多态性(rSNPs)主要位于非蛋白编码基因组中,被认为通过改变转录因子的 DNA 结合来干扰正常的基因表达模式。尽管 SNP 关联研究呈爆炸式增长,但关于 rSNPs 在人类疾病中的功能知之甚少。血清反应因子(SRF)是一种广泛表达的 DNA 结合转录因子,对至少 1216 种 10 个碱基对转录因子结合位点(TFBS)的变构具有不同的亲和力,这种 TFBS 被称为 CArG 盒。我们开发了一种强大的基于计算机的生物信息学筛选方法,用于评估 RefSeq 基因周围的序列是否存在保守的 CArG 盒。利用预定的 phastCons 阈值评分,我们在 5213 个基因的转录起始位点周围 8kb 窗口内鉴定了 8252 个链特异性 CArG,包括所有先前定义的 SRF 靶基因。然后,我们在这个 CArG 数据集中检查了以前注释的常见多态性的存在情况。我们总共发现了 118 个独特的 CArG 盒,其中包含 SNP 位于 10 个碱基对 CArG 序列内,1130 个 CArG 盒中的 SNP 位于 CArG 元件之外。凝胶迁移和荧光素酶报告基因检测验证了几个新的 CArG 盒中 SRF 的结合和功能活性。重要的是,CArG 盒内或 CArG 盒外的 SNP 通常导致 SRF 结合和活性改变。总的来说,这些发现展示了一种强大的方法,可以在人类 CArGome 中计算定义 rSNPs,并为其他 TFBS 的类似分析提供基础。在遗传关联研究中,这种信息可能对人类疾病的 rSNPs 功能知之甚少的情况有用。