• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物序列搜索的高效算法。

Efficient algorithms for biological stems search.

机构信息

Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.

出版信息

BMC Bioinformatics. 2013 May 16;14:161. doi: 10.1186/1471-2105-14-161.

DOI:10.1186/1471-2105-14-161
PMID:23679045
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3679804/
Abstract

BACKGROUND

Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs.

RESULTS

In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic's algorithm.

CONCLUSIONS

Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic's algorithm.

摘要

背景

基序是 DNA、RNA 和蛋白质序列中的重要模式,在生物过程和功能中发挥着重要作用,例如识别开放阅读框、RNA 转录、蛋白质结合等。文献中已经研究了几种基序搜索问题的版本。其中一个版本称为“已种植基序搜索(PMS)”或(l,d)-基序搜索。PMS 是 NP 完全问题。大多数已种植基序搜索算法的时间复杂度都与字母表的大小呈指数级相关。最近,Kuksa 和 Pavlovic 引入了一个新的基序搜索问题版本。我们称之为“基序茎搜索(MSS)”问题。基序茎是一个 l-mer(对于某些相关的 l 值),带有一些通配符字符,因此对应于一组 l-mers(没有通配符),其中一些是(l,d)-基序。Kuksa 和 Pavlovic 提出了一种有效的算法来为来自大字母表的输入找到基序茎。理想情况下,输出的茎数应尽可能少,因为茎形成基序的超集。

结果

在本文中,我们提出了一种有效的 MSS 算法,并在合成数据和真实数据上对其进行了评估。该评估表明,我们的算法比 Kuksa 和 Pavlovic 的算法快得多。

结论

我们的 MSS 算法在运行时间和输出茎数方面都优于 Kuksa 和 Pavlovic 的算法。具体来说,我们的算法输出的茎形成了 Kuksa 和 Pavlovic 算法输出的茎的适当(且小得多)子集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66d/3679804/11938b50091d/1471-2105-14-161-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66d/3679804/11938b50091d/1471-2105-14-161-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b66d/3679804/11938b50091d/1471-2105-14-161-1.jpg

相似文献

1
Efficient algorithms for biological stems search.生物序列搜索的高效算法。
BMC Bioinformatics. 2013 May 16;14:161. doi: 10.1186/1471-2105-14-161.
2
An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets.一种针对大字母表上基序茎搜索问题的高效精确算法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):384-97. doi: 10.1109/TCBB.2014.2361668.
3
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
4
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.qPMS7:一种在 DNA 和蛋白质序列中查找(ℓ,d)-基序的快速算法。
PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.
5
Efficient sequential and parallel algorithms for planted motif search.高效的序列和并行算法,用于种植模式搜索。
BMC Bioinformatics. 2014 Jan 31;15:34. doi: 10.1186/1471-2105-15-34.
6
PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.PairMotif:一种新的基于模式驱动的算法,用于搜索(l,d)DNA 基序。
PLoS One. 2012;7(10):e48442. doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.
7
An Efficient Exact Algorithm for Planted Motif Search on Large DNA Sequence Datasets.在大型 DNA 序列数据集上进行种植基序搜索的高效精确算法。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1542-1551. doi: 10.1109/TCBB.2024.3404136. Epub 2024 Oct 9.
8
A speedup technique for (l, d)-motif finding algorithms.一种用于(l,d)基序查找算法的加速技术。
BMC Res Notes. 2011 Mar 8;4:54. doi: 10.1186/1756-0500-4-54.
9
Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem.用于植入式(l, d)基序搜索问题的改进精确枚举算法。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):361-74. doi: 10.1109/TCBB.2014.2306842.
10
RefSelect: a reference sequence selection algorithm for planted (l, d) motif search.RefSelect:一种用于植入(l,d)基序搜索的参考序列选择算法。
BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):266. doi: 10.1186/s12859-016-1130-6.

本文引用的文献

1
PMS6: A Fast Algorithm for Motif Discovery.PMS6:一种用于基序发现的快速算法。
IEEE Int Conf Comput Adv Bio Med Sci. 2012:1-6. doi: 10.1109/ICCABS.2012.6182627.
2
Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences.Minimotif Miner 3.0:数据库扩展和从共识序列显著减少假阳性预测。
Nucleic Acids Res. 2012 Jan;40(Database issue):D252-60. doi: 10.1093/nar/gkr1189. Epub 2011 Dec 6.
3
PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem.
PMS5:(ℓ,d)-基序发现问题的高效精确算法。
BMC Bioinformatics. 2011 Oct 24;12:410. doi: 10.1186/1471-2105-12-410.
4
A speedup technique for (l, d)-motif finding algorithms.一种用于(l,d)基序查找算法的加速技术。
BMC Res Notes. 2011 Mar 8;4:54. doi: 10.1186/1756-0500-4-54.
5
Efficient motif finding algorithms for large-alphabet inputs.针对大字母表输入的高效基序发现算法。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-11-S8-S1.
6
ELM: the status of the 2010 eukaryotic linear motif resource.ELM:2010 年真核线性基序资源的现状。
Nucleic Acids Res. 2010 Jan;38(Database issue):D167-80. doi: 10.1093/nar/gkp1016. Epub 2009 Nov 17.
7
Fast and practical algorithms for planted (l, d) motif search.用于植入式(l, d)基序搜索的快速实用算法。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
8
Exact algorithms for planted motif problems.植入基序问题的精确算法。
J Comput Biol. 2005 Oct;12(8):1117-28. doi: 10.1089/cmb.2005.12.1117.
9
Finding subtle motifs by branching from sample strings.通过从样本字符串中分支来寻找微妙的基序。
Bioinformatics. 2003 Oct;19 Suppl 2:ii149-55. doi: 10.1093/bioinformatics/btg1072.
10
Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs.Scansite 2.0:利用短序列基序对细胞信号相互作用进行全蛋白质组预测。
Nucleic Acids Res. 2003 Jul 1;31(13):3635-41. doi: 10.1093/nar/gkg584.