Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
Nat Protoc. 2012 Jul 26;7(8):1551-68. doi: 10.1038/nprot.2012.088.
This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation-sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.
本方案介绍如何使用在线集成管道“peak-motifs”(http://rsat.ulb.ac.be/rsat/)来预测通过染色质免疫沉淀测序(ChIP-seq)或相关技术获得的全长峰集的基序和结合位点。该工作流程结合了四个高效的时耗和内存的基序发现算法,从序列中提取显著的基序。发现的基序与已知基序数据库进行比较,以识别潜在的结合转录因子。对序列进行扫描以预测转录因子结合位点,并分析其相对于峰中心的富集和位置分布。峰和结合位点以 BED 轨迹的形式导出,可上传到加利福尼亚大学圣克鲁斯分校(UCSC)基因组浏览器中,以便在基因组背景下进行可视化。该方案通过分析一组由果蝇转录因子 Krüppel 结合的 6000 个峰(总计 8Mb)来说明。在 Regulatory Sequence Analysis Tools(RSAT)Web 服务器上,完整的工作流程大约需要 25 分钟的计算时间即可完成。本方案大约需要 1 小时即可完成。