Suppr超能文献

用于基序检测的随机算法。

Randomized algorithms for motif detection.

作者信息

Wang Lusheng, Dong Liang

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, P. R. China.

出版信息

J Bioinform Comput Biol. 2005 Oct;3(5):1039-52. doi: 10.1142/s0219720005001508.

Abstract

MOTIVATION

Motif detection for DNA sequences has many important applications in biological studies, e.g. locating binding sites regulatory signals, designing genetic probes etc. In this paper, we propose a randomized algorithm, design an improved EM algorithm and combine them to form a software tool.

RESULTS

(1) We design a randomized algorithm for consensus pattern problem. We can show that with high probability, our randomized algorithm finds a pattern in polynomial time with cost error at most x l for each string, where l is the length of the motif and can be any positive number given by the user. (2) We design an improved EM algorithm that outperforms the original EM algorithm. (3) We develop a software tool, MotifDetector, that uses our randomized algorithm to find good seeds and uses the improved EM algorithm to do local search. We compare MotifDetector with Buhler and Tompa's PROJECTION which is considered to be the best known software for motif detection. Simulations show that MotifDetector is slower than PROJECTION when the pattern length is relatively small, and outperforms PROJECTION when the pattern length becomes large.

AVAILABILITY

It is available for free at http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html, subject to copyright restrictions.

摘要

动机

DNA序列的基序检测在生物学研究中有许多重要应用,例如定位结合位点、调控信号、设计基因探针等。在本文中,我们提出了一种随机算法,设计了一种改进的期望最大化(EM)算法,并将它们结合起来形成一个软件工具。

结果

(1)我们为一致模式问题设计了一种随机算法。我们可以证明,我们的随机算法以高概率在多项式时间内找到一个模式,对于每个字符串,成本误差至多为x l,其中l是基序的长度,并且可以是用户给定的任何正数。(2)我们设计了一种改进的EM算法,其性能优于原始的EM算法。(3)我们开发了一个软件工具MotifDetector,它使用我们的随机算法来找到好的种子,并使用改进的EM算法进行局部搜索。我们将MotifDetector与布勒和汤帕的PROJECTION进行比较,后者被认为是最著名的基序检测软件。模拟结果表明,当模式长度相对较小时,MotifDetector比PROJECTION慢,而当模式长度变大时,MotifDetector的性能优于PROJECTION。

可用性

受版权限制,可在http://www.cs.cityu.edu.hk/~lwang/software/motif/index.html免费获取。

相似文献

1
Randomized algorithms for motif detection.
J Bioinform Comput Biol. 2005 Oct;3(5):1039-52. doi: 10.1142/s0219720005001508.
2
Voting algorithms for the motif finding problem.
Comput Syst Bioinformatics Conf. 2008;7:37-47.
3
A profile-based deterministic sequential Monte Carlo algorithm for motif discovery.
Bioinformatics. 2008 Jan 1;24(1):46-55. doi: 10.1093/bioinformatics/btm543. Epub 2007 Nov 17.
4
GARD: a genetic algorithm for recombination detection.
Bioinformatics. 2006 Dec 15;22(24):3096-8. doi: 10.1093/bioinformatics/btl474. Epub 2006 Nov 16.
5
Finding motifs in the twilight zone.
Bioinformatics. 2002 Oct;18(10):1374-81. doi: 10.1093/bioinformatics/18.10.1374.
6
HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences.
Bioinformatics. 2010 Feb 1;26(3):302-9. doi: 10.1093/bioinformatics/btp676. Epub 2009 Dec 8.
7
Pattern locator: a new tool for finding local sequence patterns in genomic DNA sequences.
Bioinformatics. 2006 Dec 15;22(24):3099-100. doi: 10.1093/bioinformatics/btl551. Epub 2006 Nov 8.
8
SEGID: identifying interesting segments in (multiple) sequence alignments.
Bioinformatics. 2003 Jan 22;19(2):297-8. doi: 10.1093/bioinformatics/19.2.297.
9
MUSA: a parameter free algorithm for the identification of biologically significant motifs.
Bioinformatics. 2006 Dec 15;22(24):2996-3002. doi: 10.1093/bioinformatics/btl537. Epub 2006 Oct 26.
10
Statistical detection of chromosomal homology using shared-gene density alone.
Bioinformatics. 2005 Apr 15;21(8):1339-48. doi: 10.1093/bioinformatics/bti168. Epub 2004 Dec 7.

引用本文的文献

1
Dataset of microbial community structure in alcohol sprayed banana associated with ripening process.
Data Brief. 2020 Feb 5;29:105216. doi: 10.1016/j.dib.2020.105216. eCollection 2020 Apr.
2
An efficient rank based approach for closest string and closest substring.
PLoS One. 2012;7(6):e37576. doi: 10.1371/journal.pone.0037576. Epub 2012 Jun 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验