Suppr超能文献

整合198个染色质免疫沉淀测序数据集揭示人类顺式调控区域。

Integration of 198 ChIP-seq datasets reveals human cis-regulatory regions.

作者信息

Bolouri Hamid, Ruzzo Walter L

机构信息

Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.

出版信息

J Comput Biol. 2012 Sep;19(9):989-97. doi: 10.1089/cmb.2012.0100. Epub 2012 Aug 16.

Abstract

We analyzed 198 datasets of chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) and developed a methodology for identification of high-confidence enhancer and promoter regions from transcription factor ChIP-seq data alone. We identify 32,467 genomic regions marked with ChIP-seq binding peaks in 15 or more experiments as high-confidence cis-regulatory regions. Although the selected regions mark only ~0.67% of the genome, 70.5% of our predicted binding regions fall within independently identified, strongly expression-correlated and histone-marked enhancer regions, which cover ~8% of the genome (Ernst et al., Nature 2011 , 473, 43-49). Even more remarkably, 85.6% of our selected regions overlap transcription factor (TF) binding regions identified in evolutionarily conserved DNase1 hypersensitivity cluster regions, which cover 0.75% of the genome (Boyle et al., Genome Research 2011 , 21, 456-464). P-values for these overlaps are effectively zero (Z-scores of 328 and 715 respectively). Furthermore, 62% of our selected regions overlap the intersection of the evolutionarily conserved DNase1 hypersensitivity-identified TF-binding regions of Boyle et al. (2011) with the histone-marked enhancers found to be strongly associated with transcriptional activity by Ernst et al. (2011). Two hundred thirty of our candidate cis-regulatory regions overlap cancer-associated variants reported in the Catalogue of Somatic Mutations in Cancer ( http://www.sanger.ac.uk/genetics/CGP/cosmic/ ). We also identify 1,252 potential proximal promoters for the 7,561 disjoint lincRNA regions currently in the Human lincRNA Catalog (www.broadinstitute.org/genome_bio/human_lincrnas/). Our investigation used approximately half of all currently available ENCODE ChIP-seq datasets, suggesting further gains are likely from analysis of all datasets currently available.

摘要

我们分析了198个染色质免疫沉淀测序(ChIP-seq)数据集,并开发了一种仅从转录因子ChIP-seq数据中识别高可信度增强子和启动子区域的方法。我们将在15次或更多实验中出现ChIP-seq结合峰的32467个基因组区域确定为高可信度顺式调控区域。尽管所选区域仅占基因组的约0.67%,但我们预测的结合区域中有70.5%位于独立鉴定的、与表达高度相关且有组蛋白标记的增强子区域内,这些增强子区域占基因组的约8%(恩斯特等人,《自然》,2011年,第473卷,第43 - 49页)。更值得注意的是,我们所选区域中有85.6%与在进化保守的DNase1超敏性聚类区域中鉴定的转录因子(TF)结合区域重叠,这些区域占基因组的0.75%(博伊尔等人,《基因组研究》,2011年,第21卷,第456 - 464页)。这些重叠的P值实际上为零(Z分数分别为328和715)。此外,我们所选区域中有62%与博伊尔等人(2011年)通过进化保守的DNase1超敏性鉴定的TF结合区域与恩斯特等人(2011年)发现的与转录活性密切相关的有组蛋白标记的增强子的交集重叠。我们的230个候选顺式调控区域与癌症体细胞突变目录(http://www.sanger.ac.uk/genetics/CGP/cosmic/)中报告的癌症相关变异重叠。我们还为目前人类长链非编码RNA目录(www.broadinstitute.org/genome_bio/human_lincrnas/)中的7561个不连续长链非编码RNA区域确定了1252个潜在的近端启动子。我们的研究使用了目前所有可用的ENCODE ChIP-seq数据集的大约一半,这表明对所有现有数据集进行分析可能会有更多收获。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验