Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Bioinformatics. 2011 Sep 1;27(17):2361-7. doi: 10.1093/bioinformatics/btr412. Epub 2011 Jul 12.
Motif discovery is now routinely used in high-throughput studies including large-scale sequencing and proteomics. These datasets present new challenges. The first is speed. Many motif discovery methods do not scale well to large datasets. Another issue is identifying discriminative rather than generative motifs. Such discriminative motifs are important for identifying co-factors and for explaining changes in behavior between different conditions.
To address these issues we developed a method for DECOnvolved Discriminative motif discovery (DECOD). DECOD uses a k-mer count table and so its running time is independent of the size of the input set. By deconvolving the k-mers DECOD considers context information without using the sequences directly. DECOD outperforms previous methods both in speed and in accuracy when using simulated and real biological benchmark data. We performed new binding experiments for p53 mutants and used DECOD to identify p53 co-factors, suggesting new mechanisms for p53 activation.
The source code and binaries for DECOD are available at http://www.sb.cs.cmu.edu/DECOD CONTACT: zivbj@cs.cmu.edu
Supplementary data are available at Bioinformatics online.
基序发现现在已被常规用于高通量研究,包括大规模测序和蛋白质组学。这些数据集提出了新的挑战。第一个是速度。许多基序发现方法不能很好地扩展到大型数据集。另一个问题是识别有区别的而不是生成的基序。这种有区别的基序对于识别共同因子以及解释不同条件下行为的变化很重要。
为了解决这些问题,我们开发了一种用于 DECOnvolved Discriminative motif discovery(DECOD)的方法。DECOD 使用 k-mer 计数表,因此其运行时间与输入集的大小无关。通过对 k-mer 进行去卷积,DECOD 在不直接使用序列的情况下考虑上下文信息。在使用模拟和真实生物基准数据时,DECOD 在速度和准确性方面都优于以前的方法。我们对 p53 突变体进行了新的结合实验,并使用 DECOD 来识别 p53 共同因子,这为 p53 的激活提出了新的机制。
DECOD 的源代码和二进制文件可在 http://www.sb.cs.cmu.edu/DECOD 上获得。
补充数据可在Bioinformatics 在线获得。