• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

针对大字母表输入的高效基序发现算法。

Efficient motif finding algorithms for large-alphabet inputs.

机构信息

Department of Computer Science, Rutgers University, Piscataway, NJ 08854, USA.

出版信息

BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-11-S8-S1.

DOI:10.1186/1471-2105-11-S8-S1
PMID:21034426
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2966288/
Abstract

BACKGROUND

We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings.

RESULTS

The proposed algorithm (1) improves search efficiency compared to existing algorithms, and (2) scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA) we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families.

CONCLUSIONS

Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

摘要

背景

我们考虑在生物序列数据集识别基序(重复或保守模式)的问题。为了解决这个任务,我们提出了一种新的确定性算法,用于寻找作为精确或不精确实例嵌入在所有或大多数输入字符串中的模式。

结果

与现有算法相比,所提出的算法(1)提高了搜索效率,(2)与字母表的大小成正比。在合成种植 DNA 基序发现问题上,我们的算法对于长基序比 MITRA、PMSPrune 和 RISOTTO 效率高 10 倍以上。在相同的设置下,字母表较大时,改进幅度更高。在基准 TF 结合位点问题(FNP、CRP、LexA)上,我们观察到运行时间减少了 12 倍以上,并且具有很高的检测准确性。该算法还成功地快速识别了 Lipocalin、Zinc metallopeptidase 以及 Cadherin 和 Immunoglobin 家族的超二级结构基序中的蛋白质基序。

结论

我们的算法降低了当前基序发现算法的计算复杂度,并展示了与现有精确算法相比的强运行时间改进,特别是在大型字母序列的重要和困难情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/d4de7daa3e92/1471-2105-11-S8-S1-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/04c1faa953c9/1471-2105-11-S8-S1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/a3ec25750f64/1471-2105-11-S8-S1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/5ea474a03abb/1471-2105-11-S8-S1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/cd9bf6ae569b/1471-2105-11-S8-S1-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/b7c9bd3b6a9f/1471-2105-11-S8-S1-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/d4de7daa3e92/1471-2105-11-S8-S1-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/04c1faa953c9/1471-2105-11-S8-S1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/a3ec25750f64/1471-2105-11-S8-S1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/5ea474a03abb/1471-2105-11-S8-S1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/cd9bf6ae569b/1471-2105-11-S8-S1-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/b7c9bd3b6a9f/1471-2105-11-S8-S1-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/388c/2966288/d4de7daa3e92/1471-2105-11-S8-S1-6.jpg

相似文献

1
Efficient motif finding algorithms for large-alphabet inputs.针对大字母表输入的高效基序发现算法。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-11-S8-S1.
2
An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets.一种针对大字母表上基序茎搜索问题的高效精确算法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):384-97. doi: 10.1109/TCBB.2014.2361668.
3
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
4
EMS3: An Improved Algorithm for Finding Edit-Distance Based Motifs.EMS3:一种用于寻找基于编辑距离的基序的改进算法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):27-37. doi: 10.1109/TCBB.2020.3024222. Epub 2021 Feb 3.
5
Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.快速精确算法求解最接近字符串和子字符串问题及其在 (L, d)-基序模型中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.
6
An improved heuristic algorithm for finding motif signals in DNA sequences.一种改进的启发式算法,用于在 DNA 序列中寻找基序信号。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):959-75. doi: 10.1109/TCBB.2010.92.
7
Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem.用于植入式(l, d)基序搜索问题的改进精确枚举算法。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):361-74. doi: 10.1109/TCBB.2014.2306842.
8
Fast and practical algorithms for planted (l, d) motif search.用于植入式(l, d)基序搜索的快速实用算法。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
9
PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.PairMotif:一种新的基于模式驱动的算法,用于搜索(l,d)DNA 基序。
PLoS One. 2012;7(10):e48442. doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.
10
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.使用微米自动机处理器在生物序列中发现基序。
IEEE/ACM Trans Comput Biol Bioinform. 2016 Jan-Feb;13(1):99-111. doi: 10.1109/TCBB.2015.2430313.

引用本文的文献

1
A Review on Planted (, d) Motif Discovery Algorithms for Medical Diagnose.基于(, d)基序发现算法的医学诊断综述。
Sensors (Basel). 2022 Feb 5;22(3):1204. doi: 10.3390/s22031204.
2
Review of Different Sequence Motif Finding Algorithms.不同序列基序查找算法综述。
Avicenna J Med Biotechnol. 2019 Apr-Jun;11(2):130-148.
3
PMS6MC: A Multicore Algorithm for Motif Discovery.PMS6MC:一种用于基序发现的多核算法。

本文引用的文献

1
Fast and practical algorithms for planted (l, d) motif search.用于植入式(l, d)基序搜索的快速实用算法。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):544-52. doi: 10.1109/TCBB.2007.70241.
2
Strict rules determine arrangements of strands in sandwich proteins.严格的规则决定了三明治蛋白中链的排列方式。
Proc Natl Acad Sci U S A. 2006 Mar 14;103(11):4107-10. doi: 10.1073/pnas.0510747103. Epub 2006 Mar 2.
3
Assessing computational tools for the discovery of transcription factor binding sites.评估用于发现转录因子结合位点的计算工具。
Algorithms. 2013 Nov 18;6(4):805-823. doi: 10.3390/a6040805.
4
A new exhaustive method and strategy for finding motifs in ChIP-enriched regions.一种在 ChIP 富集区域中寻找基序的全新穷举方法和策略。
PLoS One. 2014 Jan 24;9(1):e86044. doi: 10.1371/journal.pone.0086044. eCollection 2014.
5
PMS6: A Fast Algorithm for Motif Discovery.PMS6:一种用于基序发现的快速算法。
IEEE Int Conf Comput Adv Bio Med Sci. 2012:1-6. doi: 10.1109/ICCABS.2012.6182627.
6
A fast weak motif-finding algorithm based on community detection in graphs.基于图中社区检测的快速弱模式发现算法。
BMC Bioinformatics. 2013 Jul 17;14:227. doi: 10.1186/1471-2105-14-227.
7
Efficient algorithms for biological stems search.生物序列搜索的高效算法。
BMC Bioinformatics. 2013 May 16;14:161. doi: 10.1186/1471-2105-14-161.
8
PairMotif: A new pattern-driven algorithm for planted (l, d) DNA motif search.PairMotif:一种新的基于模式驱动的算法,用于搜索(l,d)DNA 基序。
PLoS One. 2012;7(10):e48442. doi: 10.1371/journal.pone.0048442. Epub 2012 Oct 31.
9
Virtual interactomics of proteins from biochemical standpoint.从生化角度看蛋白质的虚拟相互作用组学。
Mol Biol Int. 2012;2012:976385. doi: 10.1155/2012/976385. Epub 2012 Aug 8.
10
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.qPMS7:一种在 DNA 和蛋白质序列中查找(ℓ,d)-基序的快速算法。
PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.
Nat Biotechnol. 2005 Jan;23(1):137-44. doi: 10.1038/nbt1053.
4
Discovering spike patterns in neuronal responses.发现神经元反应中的尖峰模式。
J Neurosci. 2004 Mar 24;24(12):2989-3001. doi: 10.1523/JNEUROSCI.4649-03.2004.
5
Finding composite regulatory patterns in DNA sequences.在DNA序列中寻找复合调控模式。
Bioinformatics. 2002;18 Suppl 1:S354-63. doi: 10.1093/bioinformatics/18.suppl_1.s354.
6
Combinatorial approaches to finding subtle signals in DNA sequences.在DNA序列中寻找细微信号的组合方法。
Proc Int Conf Intell Syst Mol Biol. 2000;8:269-78.
7
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.检测细微序列信号:一种用于多重比对的吉布斯采样策略。
Science. 1993 Oct 8;262(5131):208-14. doi: 10.1126/science.8211139.
8
Identifying protein-binding sites from unaligned DNA fragments.从未比对的DNA片段中识别蛋白质结合位点。
Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183-7. doi: 10.1073/pnas.86.4.1183.
9
An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences.一种用于识别和表征未比对生物聚合物序列中共有位点的期望最大化(EM)算法。
Proteins. 1990;7(1):41-51. doi: 10.1002/prot.340070105.