Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Section of Molecular Genetics and Microbiology, University of Texas at Austin, Austin, TX 78712, USA.
BMC Genet. 2012 Sep 5;13:46. doi: 10.1186/1471-2156-13-46.
Single nucleotide polymorphisms (SNPs) have been associated with many aspects of human development and disease, and many non-coding SNPs associated with disease risk are presumed to affect gene regulation. We have previously shown that SNPs within transcription factor binding sites can affect transcription factor binding in an allele-specific and heritable manner. However, such analysis has relied on prior whole-genome genotypes provided by large external projects such as HapMap and the 1000 Genomes Project. This requirement limits the study of allele-specific effects of SNPs in primary patient samples from diseases of interest, where complete genotypes are not readily available.
In this study, we show that we are able to identify SNPs de novo and accurately from ChIP-seq data generated in the ENCODE Project. Our de novo identified SNPs from ChIP-seq data are highly concordant with published genotypes. Independent experimental verification of more than 100 sites estimates our false discovery rate at less than 5%. Analysis of transcription factor binding at de novo identified SNPs revealed widespread heritable allele-specific binding, confirming previous observations. SNPs identified from ChIP-seq datasets were significantly enriched for disease-associated variants, and we identified dozens of allele-specific binding events in non-coding regions that could distinguish between disease and normal haplotypes.
Our approach combines SNP discovery, genotyping and allele-specific analysis, but is selectively focused on functional regulatory elements occupied by transcription factors or epigenetic marks, and will therefore be valuable for identifying the functional regulatory consequences of non-coding SNPs in primary disease samples.
单核苷酸多态性(SNPs)与人类发育和疾病的许多方面有关,许多与疾病风险相关的非编码 SNPs 被认为会影响基因调控。我们之前已经表明,转录因子结合位点内的 SNPs 可以以等位基因特异性和可遗传的方式影响转录因子结合。然而,这种分析依赖于先前由 HapMap 和 1000 基因组计划等大型外部项目提供的全基因组基因型。这种需求限制了对来自感兴趣疾病的原发性患者样本中 SNPs 的等位基因特异性效应的研究,在这些样本中,完整的基因型不容易获得。
在这项研究中,我们表明我们能够从 ENCODE 项目生成的 ChIP-seq 数据中从头准确地识别 SNPs。我们从 ChIP-seq 数据中从头识别的 SNPs 与已发表的基因型高度一致。对 100 多个位点的独立实验验证估计我们的假发现率小于 5%。在从头鉴定的 SNPs 处分析转录因子结合,发现了广泛的可遗传等位基因特异性结合,证实了之前的观察结果。从 ChIP-seq 数据集鉴定的 SNPs 显著富集与疾病相关的变体,我们在非编码区域中鉴定了数十个可以区分疾病和正常单倍型的等位基因特异性结合事件。
我们的方法结合了 SNP 发现、基因分型和等位基因特异性分析,但选择性地集中在转录因子或表观遗传标记占据的功能调节元件上,因此对于识别原发性疾病样本中非编码 SNPs 的功能调节后果将非常有价值。