STREME：准确且通用的序列基序发现。

STREME: accurate and versatile sequence motif discovery.

机构信息

Department of Pharmacology, University of Nevada, Reno, NV 89557, USA.

出版信息

Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.

DOI:10.1093/bioinformatics/btab203

PMID:33760053

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8479671/

Abstract

MOTIVATION

Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences-for example, the binding site motifs of DNA- and RNA-binding proteins.

RESULTS

The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME's capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely used MEME Suite of sequence analysis tools. The name STREME stands for 'Simple, Thorough, Rapid, Enriched Motif Elicitation'.

AVAILABILITY AND IMPLEMENTATION

The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

序列基序发现算法可以识别在 DNA、RNA 和蛋白质序列中执行生物功能的新序列模式，例如 DNA 和 RNA 结合蛋白的结合位点基序。

结果

这里提出的 STREME 算法在准确性和多功能性方面都推动了从头开始的基序发现的最新进展。使用体内 DNA（ChIP-seq）和 RNA（CLIP-seq）数据，并使用来自体外数据的参考基序验证基序，我们表明 STREME 比几种广泛使用的算法（DREME、HOMER、MEME、Peak-motifs）以及另外两种代表性算法（ProSampler 和 Weeder）更准确、更敏感、更全面。STREME 的功能包括在具有数十万条序列的数据集上查找基序的能力、查找短基序和长基序（从 3 到 30 个位置）的能力、在两个序列数据集对之间进行差异基序发现的能力以及在几乎任何字母表上的序列中查找基序的能力（DNA、RNA、蛋白质和用户定义的字母表）。与大多数基序发现算法不同，STREME 会报告其发现的每个基序的有用统计显着性估计。STREME 可以通过其 Web 服务器或通过命令行单独轻松使用，并且完全集成在广泛使用的 MEME 序列分析工具套件中。STREME 的名称代表“简单、彻底、快速、丰富的基序启发”。

可用性和实现

STREME Web 服务器和源代码可免费用于非商业用途，网址为 http://meme-suite.org。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

STREME: accurate and versatile sequence motif discovery.STREME：准确且通用的序列基序发现。

Bioinformatics. 2021 Sep 29;37(18):2834-2840. doi: 10.1093/bioinformatics/btab203.

MEME-ChIP: motif analysis of large DNA datasets.MEME-ChIP：大 DNA 数据集的基序分析。

Bioinformatics. 2011 Jun 15;27(12):1696-7. doi: 10.1093/bioinformatics/btr189. Epub 2011 Apr 12.

ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery.ProSampler：一种在大型 ChIP-seq 数据集中用于组合基序发现的超快速和准确的基序查找器。

Bioinformatics. 2019 Nov 1;35(22):4632-4639. doi: 10.1093/bioinformatics/btz290.

DREME: motif discovery in transcription factor ChIP-seq data.DREME：转录因子 ChIP-seq 数据中的 motif 发现。

Bioinformatics. 2011 Jun 15;27(12):1653-9. doi: 10.1093/bioinformatics/btr261. Epub 2011 May 4.

MEME SUITE: tools for motif discovery and searching.MEME套件：用于基序发现和搜索的工具。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8. doi: 10.1093/nar/gkp335. Epub 2009 May 20.

Motif-based analysis of large nucleotide data sets using MEME-ChIP.使用MEME-ChIP对大型核苷酸数据集进行基于模体的分析。

Nat Protoc. 2014;9(6):1428-50. doi: 10.1038/nprot.2014.083. Epub 2014 May 22.

EXTREME: an online EM algorithm for motif discovery.极端：一种用于基序发现的在线 EM 算法。

Bioinformatics. 2014 Jun 15;30(12):1667-73. doi: 10.1093/bioinformatics/btu093. Epub 2014 Feb 14.

MoMo: discovery of statistically significant post-translational modification motifs.MoMo：具有统计学意义的翻译后修饰基序的发现。

Bioinformatics. 2019 Aug 15;35(16):2774-2782. doi: 10.1093/bioinformatics/bty1058.

Finding de novo methylated DNA motifs.发现从头甲基化 DNA 基序。

Bioinformatics. 2019 Sep 15;35(18):3287-3293. doi: 10.1093/bioinformatics/btz079.

MEME: discovering and analyzing DNA and protein sequence motifs.MEME：发现和分析DNA与蛋白质序列基序

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73. doi: 10.1093/nar/gkl198.

引用本文的文献

Regulatory logic of neuronal identity specification in .神经元身份特异性的调控逻辑在……中

bioRxiv. 2025 Sep 3:2025.09.01.673531. doi: 10.1101/2025.09.01.673531.

An RNA polymerase III tissue and tumor atlas uncovers context-specific activities linked to 3D epigenome regulatory mechanisms.一份RNA聚合酶III组织和肿瘤图谱揭示了与三维表观基因组调控机制相关的特定背景活动。

bioRxiv. 2025 Sep 1:2025.08.28.672650. doi: 10.1101/2025.08.28.672650.

BlihIA-A Novel Type I Restriction-Modification System from Is Sensitive to In Vitro Inhibition by ArdB Antirestriction Protein.BlihIA——一种来自[具体来源未给出]的新型I型限制修饰系统，对ArdB抗限制蛋白的体外抑制敏感。

Int J Mol Sci. 2025 Sep 5;26(17):8674. doi: 10.3390/ijms26178674.

Multimodal Deep Learning for Generating Potential Anti-Dengue Peptides.用于生成潜在抗登革热肽的多模态深度学习

ACS Omega. 2025 Aug 19;10(34):38653-38674. doi: 10.1021/acsomega.5c03510. eCollection 2025 Sep 2.

YModPred: an interpretable prediction method for multi-type RNA modification sites in S. cerevisiae based on deep learning.YModPred：一种基于深度学习的用于酿酒酵母中多类型RNA修饰位点的可解释预测方法。

BMC Biol. 2025 Aug 29;23(1):272. doi: 10.1186/s12915-025-02372-y.

Comprehensive Transcriptomic and m6A Epitranscriptomic Analysis Reveals Colchicine-Induced Kidney Toxicity via DNA Damage and Autophagy in HK2 Cells.综合转录组学和m6A表观转录组学分析揭示秋水仙碱通过DNA损伤和自噬诱导HK2细胞肾毒性。

Toxins (Basel). 2025 Aug 14;17(8):408. doi: 10.3390/toxins17080408.

IQSPred-PLM: An Interpretable Quorum Sensing Peptides Prediction Model Based on Protein Language Model.IQSPred-PLM：一种基于蛋白质语言模型的可解释群体感应肽预测模型。

Interdiscip Sci. 2025 Aug 26. doi: 10.1007/s12539-025-00766-8.

5' untranslated regions tune translation.5'非翻译区调控翻译。

bioRxiv. 2025 Jul 14:2025.07.14.664749. doi: 10.1101/2025.07.14.664749.

SOX2 utilizes FOXA1 as a heteromeric transcriptional partner to drive proliferation in therapy-resistant prostate cancer.SOX2利用FOXA1作为异源转录伙伴来驱动去势抵抗性前列腺癌的增殖。

bioRxiv. 2025 Jul 19:2025.07.18.664790. doi: 10.1101/2025.07.18.664790.

Quantitative modeling of mRNA degradation reveals tempo-dependent mRNA clearance in early embryos.mRNA降解的定量建模揭示了早期胚胎中与时间相关的mRNA清除情况。

Nucleic Acids Res. 2025 Jul 19;53(14). doi: 10.1093/nar/gkaf737.

本文引用的文献

Bioinformatics. 2019 Nov 1;35(22):4632-4639. doi: 10.1093/bioinformatics/btz290.

C2H2 Zinc Finger Proteins: The Largest but Poorly Explored Family of Higher Eukaryotic Transcription Factors.C2H2锌指蛋白：高等真核生物转录因子中最大但研究较少的家族。

Acta Naturae. 2017 Apr-Jun;9(2):47-58.

Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP).通过增强型交联免疫沉淀（eCLIP）在全转录组范围内稳健地发现RNA结合蛋白结合位点。

Nat Methods. 2016 Jun;13(6):508-14. doi: 10.1038/nmeth.3810. Epub 2016 Mar 28.

A compendium of RNA-binding motifs for decoding gene regulation.RNA 结合基序手册：解码基因调控

Nature. 2013 Jul 11;499(7457):172-7. doi: 10.1038/nature12311.

DNA-binding specificities of human transcription factors.人类转录因子的 DNA 结合特异性。

Cell. 2013 Jan 17;152(1-2):327-39. doi: 10.1016/j.cell.2012.12.009.

STEME: efficient EM to find motifs in large data sets.STEME：高效的 EM 算法，用于在大数据集中发现模式。

Nucleic Acids Res. 2011 Oct;39(18):e126. doi: 10.1093/nar/gkr574. Epub 2011 Jul 23.

RSAT 2011: regulatory sequence analysis tools.RSAT 2011：调控序列分析工具。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W86-91. doi: 10.1093/nar/gkr377.

DREME: motif discovery in transcription factor ChIP-seq data.DREME：转录因子 ChIP-seq 数据中的 motif 发现。

Bioinformatics. 2011 Jun 15;27(12):1653-9. doi: 10.1093/bioinformatics/btr261. Epub 2011 May 4.

MEME-ChIP: motif analysis of large DNA datasets.MEME-ChIP：大 DNA 数据集的基序分析。

Bioinformatics. 2011 Jun 15;27(12):1696-7. doi: 10.1093/bioinformatics/btr189. Epub 2011 Apr 12.

Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.转录因子的简单组合为巨噬细胞和 B 细胞特性所需的顺式调控元件提供了启动条件。

Mol Cell. 2010 May 28;38(4):576-89. doi: 10.1016/j.molcel.2010.05.004.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验