• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于结构化基序发现的直接方法与两阶段方法

Direct vs 2-stage approaches to structured motif finding.

作者信息

Federico Maria, Leoncini Mauro, Montangero Manuela, Valente Paolo

机构信息

Dipartimento di Scienze Fisiche, Informatiche e Matematiche, Università di Modena e Reggio Emilia, 41125 Modena, Via Campi 213/b, Italy.

出版信息

Algorithms Mol Biol. 2012 Aug 21;7(1):20. doi: 10.1186/1748-7188-7-20.

DOI:10.1186/1748-7188-7-20
PMID:22908910
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3564690/
Abstract

BACKGROUND

The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of "irrelevant" base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs.

RESULTS

We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior.

CONCLUSIONS

A reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.

摘要

背景

DNA基序的概念是一种数学抽象,用于对DNA区域(称为转录因子结合位点,或TFBS)进行建模,这些区域被特定的转录因子结合以调节基因表达或抑制。反过来,DNA结构化基序是一种数学对应物,用于对在高等真核生物基因调控过程中协同作用的TFBS集合进行建模。通常,一个结构化基序由一组有序的孤立(或简单)基序组成,它们之间由数量可变但有一定限制的“无关”碱基对分隔。在一组DNA序列中发现结构化基序是一个计算难题,许多作者使用直接方法或通过简单基序的初步识别和连续组合来解决这个问题。

结果

我们描述了一种名为SISMA的计算工具,用于在一组DNA序列中从头发现结构化基序。SISMA是一种精确的枚举算法,这意味着它会找到所有符合规范的基序。它分两个阶段进行:首先它发现所有可能的组成简单基序,然后以符合给定约束的方式将它们组合起来。我们开发SISMA主要是为了了解这种两阶段方法相对于直接方法的潜在优势。事实上,对于结构化基序发现的一般问题,没有可用的两阶段软件,只有少数工具解决了该问题的受限版本。我们在由合成和真实生物数据集组成的综合基准上,将SISMA与其他已发表的工具进行了评估。在大量情况下,SISMA优于竞争对手,在大多数不如竞争对手表现的情况下也表现出良好的性能。

结论

对所获得结果的思考使我们得出结论,两阶段方法相对于直接方法具有许多优势。其中一些优势与更高的模块化、易于并行化以及对结构化基序进行自适应搜索的可能性有关。作为另一个考虑因素,我们注意到SISMA的大多数困难实例很容易预先检测到。在这些情况下,人们可以最初选择直接方法;或者,作为大多数实验室可行的替代方法,可以并行运行直接工具和两阶段工具,当第一个工具停止时停止计算。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/100dc6c24bf7/1748-7188-7-20-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/33ec8971516f/1748-7188-7-20-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/6e0e044523a4/1748-7188-7-20-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/af5fd08e9ede/1748-7188-7-20-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/d6a5d652d1a9/1748-7188-7-20-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/100dc6c24bf7/1748-7188-7-20-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/33ec8971516f/1748-7188-7-20-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/6e0e044523a4/1748-7188-7-20-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/af5fd08e9ede/1748-7188-7-20-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/d6a5d652d1a9/1748-7188-7-20-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c7ec/3564690/100dc6c24bf7/1748-7188-7-20-5.jpg

相似文献

1
Direct vs 2-stage approaches to structured motif finding.用于结构化基序发现的直接方法与两阶段方法
Algorithms Mol Biol. 2012 Aug 21;7(1):20. doi: 10.1186/1748-7188-7-20.
2
Assessment of composite motif discovery methods.复合基序发现方法的评估。
BMC Bioinformatics. 2008 Feb 26;9:123. doi: 10.1186/1471-2105-9-123.
3
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs.基于蒙特卡罗的框架增强了调控序列基序的发现和解释。
BMC Bioinformatics. 2012 Nov 27;13:317. doi: 10.1186/1471-2105-13-317.
4
Discriminative motif discovery in DNA and protein sequences using the DEME algorithm.使用DEME算法在DNA和蛋白质序列中发现鉴别性基序。
BMC Bioinformatics. 2007 Oct 15;8:385. doi: 10.1186/1471-2105-8-385.
5
A cluster refinement algorithm for motif discovery.一种用于发现模体的簇精炼算法。
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):654-68. doi: 10.1109/TCBB.2009.25.
6
Discovering multiple realistic TFBS motifs based on a generalized model.基于广义模型发现多个真实的 TFBS 基序。
BMC Bioinformatics. 2009 Oct 7;10:321. doi: 10.1186/1471-2105-10-321.
7
Regulatory motif finding by logic regression.通过逻辑回归进行调控基序发现。
Bioinformatics. 2004 Nov 1;20(16):2799-811. doi: 10.1093/bioinformatics/bth333. Epub 2004 May 27.
8
HeliCis: a DNA motif discovery tool for colocalized motif pairs with periodic spacing.HeliCis:一种用于发现具有周期性间隔的共定位基序对的DNA基序发现工具。
BMC Bioinformatics. 2007 Oct 28;8:418. doi: 10.1186/1471-2105-8-418.
9
Variable structure motifs for transcription factor binding sites.转录因子结合位点的变构基序。
BMC Genomics. 2010 Jan 14;11:30. doi: 10.1186/1471-2164-11-30.
10
Using SCOPE to identify potential regulatory motifs in coregulated genes.使用SCOPE鉴定共调控基因中的潜在调控基序。
J Vis Exp. 2011 May 31(51):2703. doi: 10.3791/2703.

引用本文的文献

1
Alignment-free method for DNA sequence clustering using Fuzzy integral similarity.基于模糊积分相似度的无比对 DNA 序列聚类方法。
Sci Rep. 2019 Mar 6;9(1):3753. doi: 10.1038/s41598-019-40452-6.
2
Fast and accurate phylogeny reconstruction using filtered spaced-word matches.使用过滤后的间隔词匹配进行快速准确的系统发育重建。
Bioinformatics. 2017 Apr 1;33(7):971-979. doi: 10.1093/bioinformatics/btw776.
3
Estimating evolutionary distances between genomic sequences from spaced-word matches.通过间隔词匹配估计基因组序列之间的进化距离。

本文引用的文献

1
RSAT: regulatory sequence analysis tools.RSAT:调控序列分析工具。
Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W119-27. doi: 10.1093/nar/gkn304. Epub 2008 May 21.
2
Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets.转录因子和微小RNA基序发现:阿马德乌斯平台及后生动物靶标集汇编
Genome Res. 2008 Jul;18(7):1180-9. doi: 10.1101/gr.076117.108. Epub 2008 Apr 14.
3
Efficient composite pattern finding from monad patterns.从单子模式中高效地寻找复合模式。
Algorithms Mol Biol. 2015 Feb 11;10:5. doi: 10.1186/s13015-015-0032-x. eCollection 2015.
Int J Bioinform Res Appl. 2007;3(1):86-99. doi: 10.1504/IJBRA.2007.011836.
4
A survey of DNA motif finding algorithms.DNA基序查找算法综述。
BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21.
5
Fast and practical algorithms for planted (l, d) motif search.用于植入式(l, d)基序搜索的快速实用算法。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
6
The evolutionary significance of cis-regulatory mutations.顺式调控突变的进化意义。
Nat Rev Genet. 2007 Mar;8(3):206-16. doi: 10.1038/nrg2063.
7
EXMOTIF: efficient structured motif extraction.EXMOTIF:高效结构化基序提取
Algorithms Mol Biol. 2006 Nov 16;1:21. doi: 10.1186/1748-7188-1-21.
8
MUSA: a parameter free algorithm for the identification of biologically significant motifs.MUSA:一种用于识别具有生物学意义基序的无参数算法。
Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.
9
How does DNA sequence motif discovery work?DNA序列基序发现是如何工作的?
Nat Biotechnol. 2006 Aug;24(8):959-61. doi: 10.1038/nbt0806-959.
10
Algorithms for challenging motif problems.用于解决具有挑战性的基序问题的算法。
J Bioinform Comput Biol. 2006 Feb;4(1):43-58. doi: 10.1142/s0219720006001692.