Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
Bioinformatics. 2018 Oct 15;34(20):3578-3580. doi: 10.1093/bioinformatics/bty396.
Motif discovery in large biopolymer sequence datasets can be computationally demanding, presenting significant challenges for discovery in omics research. MEME, arguably one of the most popular motif discovery software, takes quadratic time with respect to dataset size, leading to excessively long runtimes for large datasets. Therefore, there is a demand for fast programs that can generate results of the same quality as MEME.
Here we describe YAMDA, a highly scalable motif discovery software package. It is built on Pytorch, a tensor computation deep learning library with strong GPU acceleration that is highly optimized for tensor operations that are also useful for motifs. YAMDA takes linear time to find motifs as accurately as MEME, completing in seconds or minutes, which translates to speedups over a thousandfold.
YAMDA is freely available on Github (https://github.com/daquang/YAMDA).
Supplementary data are available at Bioinformatics online.
在大型生物聚合物序列数据集中发现模体在计算上可能很繁琐,这对组学研究中的发现提出了重大挑战。MEME 可以说是最流行的模体发现软件之一,其时间复杂度与数据集的大小成二次方关系,导致对于大型数据集的运行时间过长。因此,需要快速的程序来生成与 MEME 相同质量的结果。
这里我们描述了 YAMDA,这是一个高度可扩展的模体发现软件包。它建立在 Pytorch 之上,Pytorch 是一个张量计算深度学习库,具有强大的 GPU 加速,非常适合用于模体的张量操作。YAMDA 以线性时间准确地找到模体,完成时间在几秒钟或几分钟内,这意味着速度提高了上千倍。
YAMDA 可在 Github(https://github.com/daquang/YAMDA)上免费获得。
补充数据可在 Bioinformatics 在线获得。