Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, Berlin 14195, Germany.
Nucleic Acids Res. 2012 Feb;40(4):e31. doi: 10.1093/nar/gkr1104. Epub 2011 Dec 8.
ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1,28,000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.
ChIP-seq 技术越来越多地用于在基因组范围内描述转录因子结合和染色质标记。现在有各种工具可从峰数据集提取结合基序。然而,大多数方法仅作为命令行程序提供,或者通过网站提供,但有大小限制。我们提出了 peak-motifs,这是一个计算流程,用于在峰序列中发现基序,将它们与数据库进行比较,导出潜在的结合位点以便在 UCSC 基因组浏览器中可视化,并生成适合新手和专家用户的综合报告。它依赖于时间和内存效率高的算法,能够在几分钟内处理数千个峰。关于时间效率,peak-motifs 的性能比所有可比工具高出几个数量级。我们通过分析 12 个胚胎干细胞特异性转录因子的 4000 到 128000 个峰数据集来证明其准确性。在所有情况下,该程序都找到了预期的基序,并返回了可能由辅助因子结合的其他基序。我们进一步将 peak-motifs 应用于发现 p300 转录共激活因子峰集合中的组织特异性基序。据我们所知,peak-motifs 是唯一执行完整基序分析并提供用户友好的网络界面而没有序列大小或峰数限制的工具。