使用微米自动机处理器在生物序列中发现基序。

Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

作者信息

Roy Indranil, Aluru Srinivas

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2016 Jan-Feb;13(1):99-111. doi: 10.1109/TCBB.2015.2430313.

DOI:10.1109/TCBB.2015.2430313

Abstract

Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

摘要

在多条DNA或蛋白质序列中寻找近似保守的序列（即基序）是计算生物学中的一个重要问题。在本文中，我们考虑（l, d）基序搜索问题，即在n条给定序列中识别至少q条序列中存在的长度为l的一个或多个基序，且每次出现与基序的差异最多为d个替换。已知该问题是NP完全问题，迄今为止报道的最大已解决实例是（26,11）。我们提出了一种用于（l, d）基序搜索问题的新颖算法，该算法通过在大量非确定性有限自动机（NFA）上进行流式执行。此解决方案旨在利用微米自动机处理器，这是一种即将部署的新技术，它可以同时并行执行多个NFA。通过估计问题实例（39,18）和（40,17）的运行时间，我们展示了使用单个自动机处理器板内可用资源解决更大规模（l, d）基序搜索问题实例的能力。本文为使用这种新的加速器技术解决问题提供了有用的指导。

相似文献

Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

IEEE/ACM Trans Comput Biol Bioinform. 2016 Jan-Feb;13(1):99-111. doi: 10.1109/TCBB.2015.2430313.

Efficient sequential and parallel algorithms for finding edit distance based motifs.

BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):465. doi: 10.1186/s12864-016-2789-9.

Improved Exact Enumerative Algorithms for the Planted (l, d)-Motif Search Problem.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):361-74. doi: 10.1109/TCBB.2014.2306842.

Discovering sequence motifs.

Methods Mol Biol. 2008;452:231-51. doi: 10.1007/978-1-60327-159-2_12.

An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Mar-Apr;12(2):384-97. doi: 10.1109/TCBB.2014.2361668.

qPMS7: a fast algorithm for finding (ℓ, d)-motifs in DNA and protein sequences.

PLoS One. 2012;7(7):e41425. doi: 10.1371/journal.pone.0041425. Epub 2012 Jul 24.

Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21.

Efficient sequential and parallel algorithms for planted motif search.

BMC Bioinformatics. 2014 Jan 31;15:34. doi: 10.1186/1471-2105-15-34.

An Algorithm for Motif Discovery with Iteration on Lengths of Motifs.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):136-41. doi: 10.1109/TCBB.2014.2351793.

EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences.

BMC Bioinformatics. 2006 Jul 13;7:342. doi: 10.1186/1471-2105-7-342.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用微米自动机处理器在生物序列中发现基序。

Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献