Griffon Aurélien, Barbier Quentin, Dalino Jordi, van Helden Jacques, Spicuglia Salvatore, Ballester Benoit
INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France.
INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
Nucleic Acids Res. 2015 Feb 27;43(4):e27. doi: 10.1093/nar/gku1280. Epub 2014 Dec 3.
The large collections of ChIP-seq data rapidly accumulating in public data warehouses provide genome-wide binding site maps for hundreds of transcription factors (TFs). However, the extent of the regulatory occupancy space in the human genome has not yet been fully apprehended by integrating public ChIP-seq data sets and combining it with ENCODE TFs map. To enable genome-wide identification of regulatory elements we have collected, analysed and retained 395 available ChIP-seq data sets merged with ENCODE peaks covering a total of 237 TFs. This enhanced repertoire complements and refines current genome-wide occupancy maps by increasing the human genome regulatory search space by 14% compared to ENCODE alone, and also increases the complexity of the regulatory dictionary. As a direct application we used this unified binding repertoire to annotate variant enhancer loci (VELs) from H3K4me1 mark in two cancer cell lines (MCF-7, CRC) and observed enrichments of specific TFs involved in biological key functions to cancer development and proliferation. Those enrichments of TFs within VELs provide a direct annotation of non-coding regions detected in cancer genomes. Finally, full access to this catalogue is available online together with the TFs enrichment analysis tool (http://tagc.univ-mrs.fr/remap/).
公共数据仓库中迅速积累的大量ChIP-seq数据为数百种转录因子(TFs)提供了全基因组结合位点图谱。然而,通过整合公共ChIP-seq数据集并将其与ENCODE转录因子图谱相结合,人类基因组中调控占据空间的范围尚未得到充分认识。为了实现全基因组范围内调控元件的识别,我们收集、分析并保留了395个可用的ChIP-seq数据集,这些数据集与ENCODE峰合并,共涵盖237个转录因子。与仅使用ENCODE相比,这个增强的数据集通过将人类基因组调控搜索空间增加14%,补充并完善了当前的全基因组占据图谱,同时也增加了调控字典的复杂性。作为直接应用,我们使用这个统一的结合数据集对两种癌细胞系(MCF-7、CRC)中H3K4me1标记的变异增强子位点(VELs)进行注释,并观察到参与癌症发展和增殖的生物学关键功能的特定转录因子的富集。VELs中转录因子的这些富集为癌症基因组中检测到的非编码区域提供了直接注释。最后,这个目录以及转录因子富集分析工具可在线获取(http://tagc.univ-mrs.fr/remap/)。