Suppr超能文献

发现带有任意插入和缺失的序列基序。

Discovering sequence motifs with arbitrary insertions and deletions.

作者信息

Frith Martin C, Saunders Neil F W, Kobe Bostjan, Bailey Timothy L

机构信息

Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.

出版信息

PLoS Comput Biol. 2008 May 9;4(4):e1000071. doi: 10.1371/journal.pcbi.1000071.

Abstract

BIOLOGY IS ENCODED IN MOLECULAR SEQUENCES

deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for "motif-like" alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2.

摘要

生物学信息编码于分子序列中

解读这种编码仍是一项重大的科学挑战。DNA、RNA和蛋白质序列的功能区域通常呈现出特征性但细微的基序;因此,序列中基序的计算发现是一个基础且被广泛研究的问题。然而,当前大多数算法不允许基序内存在插入或缺失(indel),少数允许的算法也有其他局限性。我们提出了一种方法GLAM2(带间隙的基序局部比对),用于以完全通用的方式发现允许indel的基序,以及一种配套方法GLAM2SCAN,用于使用此类基序搜索序列数据库。GLAM2是无间隙吉布斯采样算法的推广。它从PROSITE数据库中重新发现可变宽度蛋白质基序的准确性明显高于替代方法PRATT和SAM-T2K。此外,它有效地完善了来自ELM数据库的蛋白质基序:在某些情况下,完善后的基序比原始的ELM正则表达式产生的过度预测少几个数量级。GLAM2在BAliBASE多序列比对基准测试中表现良好,对于具有N端和C端延伸的“类基序”比对,可能优于领先的多序列比对方法。最后,我们展示了使用GLAM2发现蛋白激酶底物基序和仅含LIM结构域的转录调节复合物的带间隙DNA基序:使用GLAM2SCAN,我们为后者确定了有前景的靶点。GLAM2对于短蛋白质基序特别有前景,它应该能提高我们识别构成生物学许多基础的蛋白质切割位点、相互作用位点、翻译后修饰附着位点等的能力。对于DNA和RNA中任意带间隙的基序,它可能同样有用,尽管目前已知的此类基序实例较少。GLAM2是公共领域软件,可从http://bioinformatics.org.au/glam2下载

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72ea/2323616/a7d27c5f26aa/pcbi.1000071.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验