• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于(l,d)基序查找算法的加速技术。

A speedup technique for (l, d)-motif finding algorithms.

作者信息

Rajasekaran Sanguthevar, Dinh Hieu

机构信息

Department of CSE, University of Connecticut, Storrs, CT 06269, USA.

出版信息

BMC Res Notes. 2011 Mar 8;4:54. doi: 10.1186/1756-0500-4-54.

DOI:10.1186/1756-0500-4-54
PMID:21385438
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3063805/
Abstract

BACKGROUND

The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem.

RESULTS

Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS), (l, d)-motif search (or Planted Motif Search (PMS)), and Edit-distance-based Motif Search (EMS). In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms.

CONCLUSIONS

We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very effective. The implementation of algorithms is freely available on the web at http://www.engr.uconn.edu/rajasek/PMS4.zip.

摘要

背景

DNA、RNA和蛋白质序列中模式的发现推动了许多重要生物学问题的解决。例如,核酸序列中模式的识别导致了开放阅读框的确定、基因启动子元件的识别、内含子/外显子剪接位点的识别、短干扰RNA的识别、RNA降解信号的定位、可变剪接位点的识别等。在蛋白质序列中,模式已被证明在结构域识别、蛋白酶切割位点定位、信号肽识别、蛋白质相互作用、蛋白质降解元件确定、蛋白质转运元件识别等方面非常有用。基序是有助于发现转录调控元件、转录因子结合位点、功能基因组学、药物设计等的重要模式。因此,已经有大量论文致力于解决基序搜索问题。

结果

文献中提出了基序搜索问题的三个版本:简单基序搜索(SMS)、(l, d)-基序搜索(或植入基序搜索(PMS))和基于编辑距离的基序搜索(EMS)。在本文中,我们关注PMS。文献中可找到两种用于解决PMS问题的算法:精确算法和近似算法。精确算法总能识别出基序,而近似算法可能无法识别部分或全部基序。PMS问题的精确版本已被证明是NP难的。文献中为PMS提出的精确算法在某些基础参数上花费的时间是指数级的。在本文中,我们提出了一种可用于加速PMS算法的通用技术。

结论

我们提出了一种可用于任何PMS算法的加速技术。我们在多种算法上测试了我们的加速技术。这些实验结果表明我们的加速技术确实非常有效。算法的实现可在网页http://www.engr.uconn.edu/rajasek/PMS4.zip上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/028e088bc2b1/1756-0500-4-54-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/d628df0ec45c/1756-0500-4-54-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/8d38c44a68d5/1756-0500-4-54-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/ff9b3fb9bb7c/1756-0500-4-54-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/028e088bc2b1/1756-0500-4-54-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/d628df0ec45c/1756-0500-4-54-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/8d38c44a68d5/1756-0500-4-54-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/ff9b3fb9bb7c/1756-0500-4-54-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/043e/3063805/028e088bc2b1/1756-0500-4-54-4.jpg

相似文献

1
A speedup technique for (l, d)-motif finding algorithms.一种用于(l,d)基序查找算法的加速技术。
BMC Res Notes. 2011 Mar 8;4:54. doi: 10.1186/1756-0500-4-54.
2
PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem.PMS5:(ℓ,d)-基序发现问题的高效精确算法。
BMC Bioinformatics. 2011 Oct 24;12:410. doi: 10.1186/1471-2105-12-410.
3
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.qPMS7:一种在 DNA 和蛋白质序列中查找(ℓ,d)-基序的快速算法。
PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.
4
qPMS9: an efficient algorithm for quorum Planted Motif Search.qPMS9:一种用于群体植入基序搜索的高效算法。
Sci Rep. 2015 Jan 15;5:7813. doi: 10.1038/srep07813.
5
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
6
Efficient sequential and parallel algorithms for planted motif search.高效的序列和并行算法,用于种植模式搜索。
BMC Bioinformatics. 2014 Jan 31;15:34. doi: 10.1186/1471-2105-15-34.
7
PMS: a panoptic motif search tool.PMS:一种全景基序搜索工具。
PLoS One. 2013 Dec 4;8(12):e80660. doi: 10.1371/journal.pone.0080660. eCollection 2013.
8
Efficient algorithms for biological stems search.生物序列搜索的高效算法。
BMC Bioinformatics. 2013 May 16;14:161. doi: 10.1186/1471-2105-14-161.
9
An Efficient Exact Algorithm for Planted Motif Search on Large DNA Sequence Datasets.在大型 DNA 序列数据集上进行种植基序搜索的高效精确算法。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1542-1551. doi: 10.1109/TCBB.2024.3404136. Epub 2024 Oct 9.
10
RefSelect: a reference sequence selection algorithm for planted (l, d) motif search.RefSelect:一种用于植入(l,d)基序搜索的参考序列选择算法。
BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):266. doi: 10.1186/s12859-016-1130-6.

引用本文的文献

1
A Review on Planted (, d) Motif Discovery Algorithms for Medical Diagnose.基于(, d)基序发现算法的医学诊断综述。
Sensors (Basel). 2022 Feb 5;22(3):1204. doi: 10.3390/s22031204.
2
Review of Different Sequence Motif Finding Algorithms.不同序列基序查找算法综述。
Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.
3
qPMS9: an efficient algorithm for quorum Planted Motif Search.qPMS9:一种用于群体植入基序搜索的高效算法。

本文引用的文献

1
Computational techniques for motif search.用于基序搜索的计算技术。
Front Biosci (Landmark Ed). 2009 Jun 1;14(13):5052-65. doi: 10.2741/3586.
2
Fast and practical algorithms for planted (l, d) motif search.用于植入式(l, d)基序搜索的快速实用算法。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
3
Exact algorithms for planted motif problems.植入基序问题的精确算法。
Sci Rep. 2015 Jan 15;5:7813. doi: 10.1038/srep07813.
4
PMS6MC: A Multicore Algorithm for Motif Discovery.PMS6MC:一种用于基序发现的多核算法。
Algorithms. 2013 Nov 18;6(4):805-823. doi: 10.3390/a6040805.
5
Efficient sequential and parallel algorithms for planted motif search.高效的序列和并行算法,用于种植模式搜索。
BMC Bioinformatics. 2014 Jan 31;15:34. doi: 10.1186/1471-2105-15-34.
6
PMS6: A Fast Algorithm for Motif Discovery.PMS6:一种用于基序发现的快速算法。
IEEE Int Conf Comput Adv Bio Med Sci. 2012:1-6. doi: 10.1109/ICCABS.2012.6182627.
7
A fast weak motif-finding algorithm based on community detection in graphs.基于图中社区检测的快速弱模式发现算法。
BMC Bioinformatics. 2013 Jul 17;14:227. doi: 10.1186/1471-2105-14-227.
8
Efficient algorithms for biological stems search.生物序列搜索的高效算法。
BMC Bioinformatics. 2013 May 16;14:161. doi: 10.1186/1471-2105-14-161.
9
A hybrid method for the exact planted (l, d) motif finding problem and its parallelization.用于精确种植 (l, d) 模式问题的混合方法及其并行化。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S10. doi: 10.1186/1471-2105-13-S17-S10. Epub 2012 Dec 13.
10
PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.PairMotif:一种新的基于模式驱动的算法,用于搜索(l,d)DNA 基序。
PLoS One. 2012;7(10):e48442. doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.
J Comput Biol. 2005 Oct;12(8):1117-28. doi: 10.1089/cmb.2005.12.1117.
4
Finding subtle motifs by branching from sample strings.通过从样本字符串中分支来寻找微妙的基序。
Bioinformatics. 2003 Oct;19 Suppl 2:ii149-55. doi: 10.1093/bioinformatics/btg1072.
5
Finding motifs in the twilight zone.在模糊地带寻找基序。
Bioinformatics. 2002 Oct;18(10):1374-81. doi: 10.1093/bioinformatics/18.10.1374.
6
Finding composite regulatory patterns in DNA sequences.在DNA序列中寻找复合调控模式。
Bioinformatics. 2002;18 Suppl 1:S354-63. doi: 10.1093/bioinformatics/18.suppl_1.s354.
7
Finding motifs using random projections.使用随机投影寻找基序。
J Comput Biol. 2002;9(2):225-42. doi: 10.1089/10665270252935430.
8
A statistical method for finding transcription factor binding sites.一种寻找转录因子结合位点的统计方法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:344-54.
9
Combinatorial approaches to finding subtle signals in DNA sequences.在DNA序列中寻找细微信号的组合方法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:269-78.
10
An exact algorithm to identify motifs in orthologous sequences from multiple species.一种用于识别多个物种直系同源序列中基序的精确算法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:37-45.