Department of Molecular Biology, Faculty of Science, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.
Bioinformatics. 2011 Jan 15;27(2):270-1. doi: 10.1093/bioinformatics/btq636. Epub 2010 Nov 15.
Accurate prediction of transcription factor binding motifs that are enriched in a collection of sequences remains a computational challenge. Here we report on GimmeMotifs, a pipeline that incorporates an ensemble of computational tools to predict motifs de novo from ChIP-sequencing (ChIP-seq) data. Similar redundant motifs are compared using the weighted information content (WIC) similarity score and clustered using an iterative procedure. A comprehensive output report is generated with several different evaluation metrics to compare and evaluate the results. Benchmarks show that the method performs well on human and mouse ChIP-seq datasets. GimmeMotifs consists of a suite of command-line scripts that can be easily implemented in a ChIP-seq analysis pipeline.
GimmeMotifs is implemented in Python and runs on Linux. The source code is freely available for download at http://www.ncmls.eu/bioinfo/gimmemotifs/.
Supplementary data are available at Bioinformatics online.
准确预测富含序列集合的转录因子结合基序仍然是一个计算挑战。本文报告了 GimmeMotifs,这是一个从 ChIP-seq (ChIP-seq)数据中从头预测基序的组合计算工具的管道。使用加权信息内容 (WIC)相似性评分比较相似的冗余基序,并使用迭代过程进行聚类。生成了一个带有多个不同评估指标的综合输出报告,以比较和评估结果。基准测试表明,该方法在人类和小鼠 ChIP-seq 数据集上表现良好。GimmeMotifs 由一套命令行脚本组成,可以轻松地在 ChIP-seq 分析管道中实现。
GimmeMotifs 是用 Python 编写的,可在 Linux 上运行。源代码可在 http://www.ncmls.eu/bioinfo/gimmemotifs/ 免费下载。
补充数据可在生物信息学在线获得。