Department of Pharmacology, University of Nevada, Reno, NV 89557, USA.
Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.
Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences-for example, the binding site motifs of DNA- and RNA-binding proteins.
The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME's capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely used MEME Suite of sequence analysis tools. The name STREME stands for 'Simple, Thorough, Rapid, Enriched Motif Elicitation'.
The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org.
Supplementary data are available at Bioinformatics online.
序列基序发现算法可以识别在 DNA、RNA 和蛋白质序列中执行生物功能的新序列模式,例如 DNA 和 RNA 结合蛋白的结合位点基序。
这里提出的 STREME 算法在准确性和多功能性方面都推动了从头开始的基序发现的最新进展。使用体内 DNA(ChIP-seq)和 RNA(CLIP-seq)数据,并使用来自体外数据的参考基序验证基序,我们表明 STREME 比几种广泛使用的算法(DREME、HOMER、MEME、Peak-motifs)以及另外两种代表性算法(ProSampler 和 Weeder)更准确、更敏感、更全面。STREME 的功能包括在具有数十万条序列的数据集上查找基序的能力、查找短基序和长基序(从 3 到 30 个位置)的能力、在两个序列数据集对之间进行差异基序发现的能力以及在几乎任何字母表上的序列中查找基序的能力(DNA、RNA、蛋白质和用户定义的字母表)。与大多数基序发现算法不同,STREME 会报告其发现的每个基序的有用统计显着性估计。STREME 可以通过其 Web 服务器或通过命令行单独轻松使用,并且完全集成在广泛使用的 MEME 序列分析工具套件中。STREME 的名称代表“简单、彻底、快速、丰富的基序启发”。
STREME Web 服务器和源代码可免费用于非商业用途,网址为 http://meme-suite.org。
补充数据可在 Bioinformatics 在线获得。