• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EMS3:一种用于寻找基于编辑距离的基序的改进算法。

EMS3: An Improved Algorithm for Finding Edit-Distance Based Motifs.

作者信息

Xiao Peng, Cai Xingyu, Rajasekaran Sanguthevar

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):27-37. doi: 10.1109/TCBB.2020.3024222. Epub 2021 Feb 3.

DOI:10.1109/TCBB.2020.3024222
PMID:32931433
Abstract

Discovering patterns in biological sequences is a crucial step to extract useful information from them. Motifs can be viewed as patterns that occur exactly or with minor changes across some or all of the biological sequences. Motif search has numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity among families of proteins, etc. The general problem of motif search is intractable. One of the most studied models of motif search proposed in literature is Edit-distance based Motif Search (EMS). In EMS, the goal is to find all the patterns of length l that occur with an edit-distance of at most d in each of the input sequences. EMS algorithms existing in the literature do not scale well on challenging instances and large datasets. In this paper, the current state-of-the-art EMS solver is advanced by exploiting the idea of dimension reduction. A novel idea to reduce the cardinality of the alphabet is proposed. The algorithm we propose, EMS3, is an exact algorithm. I.e., it finds all the motifs present in the input sequences. EMS3 can be also viewed as a divide and conquer algorithm. In this paper, we provide theoretical analyses to establish the efficiency of EMS3. Extensive experiments on standard benchmark datasets (synthetic and real-world) show that the proposed algorithm outperforms the existing state-of-the-art algorithm (EMS2).

摘要

在生物序列中发现模式是从这些序列中提取有用信息的关键步骤。基序可以被视为在部分或所有生物序列中精确出现或略有变化的模式。基序搜索有许多应用,包括转录因子及其结合位点的识别、复合调控模式、蛋白质家族间的相似性等。基序搜索的一般问题是难以处理的。文献中提出的最受研究的基序搜索模型之一是基于编辑距离的基序搜索(EMS)。在EMS中,目标是找到在每个输入序列中编辑距离至多为d出现的所有长度为l的模式。文献中现有的EMS算法在具有挑战性的实例和大型数据集上扩展性不佳。在本文中,通过利用降维思想改进了当前最先进的EMS求解器。提出了一种降低字母表基数的新颖想法。我们提出的算法EMS3是一种精确算法。即,它能找到输入序列中存在的所有基序。EMS3也可以被视为一种分治算法。在本文中,我们提供理论分析以确定EMS3的效率。在标准基准数据集(合成和真实世界)上进行的大量实验表明,所提出的算法优于现有的最先进算法(EMS2)。

相似文献

1
EMS3: An Improved Algorithm for Finding Edit-Distance Based Motifs.EMS3:一种用于寻找基于编辑距离的基序的改进算法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):27-37. doi: 10.1109/TCBB.2020.3024222. Epub 2021 Feb 3.
2
Efficient sequential and parallel algorithms for finding edit distance based motifs.用于查找基于编辑距离的基序的高效顺序和并行算法。
BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.
3
Novel algorithms for LDD motif search.新型 LDD 基序搜索算法。
BMC Genomics. 2019 Jun 6;20(Suppl 5):424. doi: 10.1186/s12864-019-5701-6.
4
Efficient motif finding algorithms for large-alphabet inputs.针对大字母表输入的高效基序发现算法。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-11-S8-S1.
5
An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets.一种针对大字母表上基序茎搜索问题的高效精确算法。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):384-97. doi: 10.1109/TCBB.2014.2361668.
6
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.使用微米自动机处理器在生物序列中发现基序。
IEEE/ACM Trans Comput Biol Bioinform. 2016 Jan-Feb;13(1):99-111. doi: 10.1109/TCBB.2015.2430313.
7
Finding motifs with insufficient number of strong binding sites.发现具有数量不足的强结合位点的基序。
J Comput Biol. 2005 Jul-Aug;12(6):686-701. doi: 10.1089/cmb.2005.12.686.
8
qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.qPMS7:一种在 DNA 和蛋白质序列中查找(ℓ,d)-基序的快速算法。
PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.
9
Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem.用于植入式(l, d)基序搜索问题的改进精确枚举算法。
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):361-74. doi: 10.1109/TCBB.2014.2306842.
10
Graphical approach to weak motif recognition.
Genome Inform. 2004;15(2):52-62.