• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

qPMS9:一种用于群体植入基序搜索的高效算法。

qPMS9: an efficient algorithm for quorum Planted Motif Search.

作者信息

Nicolae Marius, Rajasekaran Sanguthevar

机构信息

Department of Computer Science and Engineering University of Connecticut, Storrs, CT, USA.

出版信息

Sci Rep. 2015 Jan 15;5:7813. doi: 10.1038/srep07813.

DOI:10.1038/srep07813
PMID:25589474
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4295094/
Abstract

Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.

摘要

在生物序列中发现模式是一个关键问题。例如,对DNA序列模式的识别已导致开放阅读框的确定、基因启动子元件的识别、内含子/外显子剪接位点和SH RNA的识别、RNA降解信号的定位、可变剪接位点的识别等。在蛋白质序列中,模式已导致结构域识别、蛋白酶切割位点的定位、信号肽的识别、蛋白质相互作用、蛋白质降解元件的确定、蛋白质转运元件的识别、短功能基序的发现等。在本文中,我们专注于一类重要模式即基序的识别。我们研究(ℓ, d)基序搜索问题或植入基序搜索(PMS)。PMS的输入是n个字符串以及两个整数ℓ和d。它返回在每个输入字符串中出现的长度为ℓ的所有序列M,其中每次出现与M最多在d个位置上不同。另一种表述是法定人数PMS (qPMS),其中基序出现在至少q%的字符串中。我们引入qPMS9,一种并行精确qPMS算法,它在DNA和蛋白质数据集上显著提高了运行时间。qPMS9解决了具有挑战性的DNA (ℓ, d)实例(28, 12)和(30, 13)。源代码可在https://code.google.com/p/qpms9/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/2a7b3f97fc9d/srep07813-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/7ac5d6fbbafb/srep07813-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/455296b5bee0/srep07813-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/9f6691cc3621/srep07813-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/2a7b3f97fc9d/srep07813-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/7ac5d6fbbafb/srep07813-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/455296b5bee0/srep07813-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/9f6691cc3621/srep07813-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53d/4295094/2a7b3f97fc9d/srep07813-f4.jpg

相似文献

1
qPMS9: an efficient algorithm for quorum Planted Motif Search.qPMS9:一种用于群体植入基序搜索的高效算法。
Sci Rep. 2015 Jan 15;5:7813. doi: 10.1038/srep07813.
2
A speedup technique for (l, d)-motif finding algorithms.一种用于(l,d)基序查找算法的加速技术。
BMC Res Notes. 2011 Mar 8;4:54. doi: 10.1186/1756-0500-4-54.
3
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.qPMS7:一种在 DNA 和蛋白质序列中查找(ℓ,d)-基序的快速算法。
PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.
4
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.SamSelect:一种用于在大型 DNA 数据集上进行约定种植基序搜索的样本序列选择算法。
BMC Bioinformatics. 2018 Jun 18;19(1):228. doi: 10.1186/s12859-018-2242-y.
5
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
6
Efficient sequential and parallel algorithms for planted motif search.高效的序列和并行算法,用于种植模式搜索。
BMC Bioinformatics. 2014 Jan 31;15:34. doi: 10.1186/1471-2105-15-34.
7
PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem.PMS5:(ℓ,d)-基序发现问题的高效精确算法。
BMC Bioinformatics. 2011 Oct 24;12:410. doi: 10.1186/1471-2105-12-410.
8
Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.快速精确算法求解最接近字符串和子字符串问题及其在 (L, d)-基序模型中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.
9
An Efficient Exact Algorithm for Planted Motif Search on Large DNA Sequence Datasets.在大型 DNA 序列数据集上进行种植基序搜索的高效精确算法。
IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1542-1551. doi: 10.1109/TCBB.2024.3404136. Epub 2024 Oct 9.
10
Freezing firefly algorithm for efficient planted (ℓ, d) motif search.用于有效搜索种植 (ℓ, d) 基序的冻结萤火虫算法。
Med Biol Eng Comput. 2022 Feb;60(2):511-530. doi: 10.1007/s11517-021-02468-x. Epub 2022 Jan 12.

引用本文的文献

1
A Review on Planted (, d) Motif Discovery Algorithms for Medical Diagnose.基于(, d)基序发现算法的医学诊断综述。
Sensors (Basel). 2022 Feb 5;22(3):1204. doi: 10.3390/s22031204.
2
Novel algorithms for LDD motif search.新型 LDD 基序搜索算法。
BMC Genomics. 2019 Jun 6;20(Suppl 5):424. doi: 10.1186/s12864-019-5701-6.
3
SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets.SamSelect:一种用于在大型 DNA 数据集上进行约定种植基序搜索的样本序列选择算法。

本文引用的文献

1
Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem.用于植入式(l, d)基序搜索问题的改进精确枚举算法。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):361-74. doi: 10.1109/TCBB.2014.2306842.
2
Efficient sequential and parallel algorithms for planted motif search.高效的序列和并行算法,用于种植模式搜索。
BMC Bioinformatics. 2014 Jan 31;15:34. doi: 10.1186/1471-2105-15-34.
3
PMS6: A Fast Algorithm for Motif Discovery.PMS6:一种用于基序发现的快速算法。
BMC Bioinformatics. 2018 Jun 18;19(1):228. doi: 10.1186/s12859-018-2242-y.
4
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.用于最大似然聚类的D-Phylo算法的并行实现。
IET Nanobiotechnol. 2017 Mar;11(2):134-142. doi: 10.1049/iet-nbt.2016.0005.
5
PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets.配对基序染色质免疫沉淀测序:一种用于在大型染色质免疫沉淀测序数据集中发现保守模式的快速算法。
Biomed Res Int. 2016;2016:4986707. doi: 10.1155/2016/4986707. Epub 2016 Oct 24.
6
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
7
RefSelect: a reference sequence selection algorithm for planted (l, d) motif search.RefSelect:一种用于植入(l,d)基序搜索的参考序列选择算法。
BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):266. doi: 10.1186/s12859-016-1130-6.
IEEE Int Conf Comput Adv Bio Med Sci. 2012:1-6. doi: 10.1109/ICCABS.2012.6182627.
4
PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.PairMotif:一种新的基于模式驱动的算法,用于搜索(l,d)DNA 基序。
PLoS One. 2012;7(10):e48442. doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.
5
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.qPMS7:一种在 DNA 和蛋白质序列中查找(ℓ,d)-基序的快速算法。
PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.
6
PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem.PMS5:(ℓ,d)-基序发现问题的高效精确算法。
BMC Bioinformatics. 2011 Oct 24;12:410. doi: 10.1186/1471-2105-12-410.
7
A speedup technique for (l, d)-motif finding algorithms.一种用于(l,d)基序查找算法的加速技术。
BMC Res Notes. 2011 Mar 8;4:54. doi: 10.1186/1756-0500-4-54.
8
Fast and practical algorithms for planted (l, d) motif search.用于植入式(l, d)基序搜索的快速实用算法。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
9
Exact algorithms for planted motif problems.植入基序问题的精确算法。
J Comput Biol. 2005 Oct;12(8):1117-28. doi: 10.1089/cmb.2005.12.1117.
10
Finding subtle motifs by branching from sample strings.通过从样本字符串中分支来寻找微妙的基序。
Bioinformatics. 2003 Oct;19 Suppl 2:ii149-55. doi: 10.1093/bioinformatics/btg1072.