School of Mathematics, Shandong University, Jinan 250100, China.
Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
Bioinformatics. 2019 Nov 1;35(22):4632-4639. doi: 10.1093/bioinformatics/btz290.
The availability of numerous ChIP-seq datasets for transcription factors (TF) has provided an unprecedented opportunity to identify all TF binding sites in genomes. However, the progress has been hindered by the lack of a highly efficient and accurate tool to find not only the target motifs, but also cooperative motifs in very big datasets.
We herein present an ultrafast and accurate motif-finding algorithm, ProSampler, based on a novel numeration method and Gibbs sampler. ProSampler runs orders of magnitude faster than the fastest existing tools while often more accurately identifying motifs of both the target TFs and cooperators. Thus, ProSampler can greatly facilitate the efforts to identify the entire cis-regulatory code in genomes.
Source code and binaries are freely available for download at https://github.com/zhengchangsulab/prosampler. It was implemented in C++ and supported on Linux, macOS and MS Windows platforms.
Supplementary materials are available at Bioinformatics online.
大量转录因子(TF)的 ChIP-seq 数据集的出现,为鉴定基因组中所有 TF 结合位点提供了前所未有的机会。然而,由于缺乏一种高效、准确的工具,不仅难以在非常大的数据集里找到目标基序,也难以找到协同基序,该研究进展受到了阻碍。
本文提出了一种超快、超准的基序发现算法 ProSampler,该算法基于一种新颖的编号方法和吉布斯采样器。ProSampler 的运行速度比现有的最快工具快几个数量级,而且通常更准确地识别目标 TF 和协同因子的基序。因此,ProSampler 可以极大地促进鉴定基因组中整个顺式调控代码的工作。
源代码和二进制文件可在 https://github.com/zhengchangsulab/prosampler 上免费下载。它是用 C++编写的,支持 Linux、macOS 和 MS Windows 平台。
补充材料可在 Bioinformatics 在线获取。