一种用于更快的结构RNA茎环搜索的迷你贪心算法。

A mini-greedy algorithm for faster structural RNA stem-loop search.

作者信息

Gorodkin J, Lyngso R B, Stormo G D

机构信息

Bioinformatics Research Center and Department of Genetics and Ecology, University of Aarhus, Building 540, Ny Munkegade, DK-8000 Aarhus, Denmark.

出版信息

Genome Inform. 2001;12:184-93.

PMID:11791237

Abstract

When a set of coregulated genes share a common structural RNA motif, e.g. a hairpin, most motif search approaches fail to locate the covarying but structurally conserved motif. There do exist methods that can locate structural RNA motifs, like FOLDALIGN, but the main problem with these methods is that they are computationally expensive. In FOLDALIGN, a major contribution to this is the use of a greedy algorithm to construct the multiple alignment. To ensure good quality many redundant computations must be made. However, by applying the greedy algorithm on a carefully selected subset of sequences, near full greedy quality can be obtained. The basic idea is to estimate the order in which the sequences entered a good greedy alignment. If such a ranking, found from all pairwise alignments, is in good agreement with the order of appearance in the multiple alignment, the core structural motif can be found by performing the greedy algorithm on just the top sequences in the ranking. The ranking used in this mini-greedy algorithm is found by using two complementing approaches: 1) When interpreting the FOLDALIGN score as an inner product (kernel), the sequences can be ranked according to their distance to their center of mass; 2) We construct an algorithm that attempts to find the K closest sequences in the vector space associated with the inner product, and the remaining sequences can be ranked by their minimum distance to any of the sequences, or to the center of mass in this set. The two approaches arecompared and merged, and the results discussed. We also show that structural alignments of near full greedy quality can found in significantly reduced time, using these methods. The algorithm is being included in the SLASH (Stem-Loop Align SearcH) server available at http://www.bioinf.au.dk/slash.

摘要

当一组共调控基因共享一个共同的结构RNA基序（例如发夹结构）时，大多数基序搜索方法都无法定位到协同变化但结构保守的基序。确实存在一些能够定位结构RNA基序的方法，比如FOLDALIGN，但这些方法的主要问题在于计算成本高昂。在FOLDALIGN中，造成这种情况的一个主要因素是使用了贪婪算法来构建多重比对。为了确保高质量，必须进行许多冗余计算。然而，通过在精心挑选的序列子集上应用贪婪算法，可以获得接近完全贪婪算法的质量。基本思路是估计序列进入良好贪婪比对的顺序。如果从所有两两比对中找到的这种排序与多重比对中的出现顺序高度一致，那么通过仅对排序中靠前的序列执行贪婪算法，就可以找到核心结构基序。这种迷你贪婪算法中使用的排序是通过两种互补方法找到的：1）当将FOLDALIGN分数解释为内积（核）时，可以根据序列到其质心的距离对序列进行排序；2）我们构建了一种算法，试图在与内积相关的向量空间中找到K个最接近的序列，其余序列可以根据它们到这些序列中任何一个的最小距离，或者到该集合质心的最小距离进行排序。对这两种方法进行了比较和合并，并讨论了结果。我们还表明，使用这些方法可以在显著减少的时间内找到接近完全贪婪算法质量的结构比对。该算法已被纳入可在http://www.bioinf.au.dk/slash获取的SLASH（茎环比对搜索）服务器中。

相似文献

A mini-greedy algorithm for faster structural RNA stem-loop search.一种用于更快的结构RNA茎环搜索的迷你贪心算法。

Genome Inform. 2001;12:184-93.

The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search.用于成对结构RNA比对和共同基序搜索的FOLDALIGN网络服务器。

Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W650-3. doi: 10.1093/nar/gki473.

Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%.序列相似性低于40%的RNA序列的成对局部结构比对。

Bioinformatics. 2005 May 1;21(9):1815-24. doi: 10.1093/bioinformatics/bti279. Epub 2005 Jan 18.

A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences.一种用于预测未比对序列中包括假结在内的常见RNA二级结构基序的图论方法。

Bioinformatics. 2004 Jul 10;20(10):1591-602. doi: 10.1093/bioinformatics/bth131. Epub 2004 Feb 12.

[A method for prediction of conserved RNA secondary structures].[一种预测保守RNA二级结构的方法]

Mol Biol (Mosk). 2007 Jul-Aug;41(4):711-8.

Murlet: a practical multiple alignment tool for structural RNA sequences.Murlet：一种用于结构RNA序列的实用多序列比对工具。

Bioinformatics. 2007 Jul 1;23(13):1588-98. doi: 10.1093/bioinformatics/btm146. Epub 2007 Apr 25.

Multiple structural alignment and clustering of RNA sequences.RNA序列的多重结构比对与聚类

Bioinformatics. 2007 Apr 15;23(8):926-32. doi: 10.1093/bioinformatics/btm049. Epub 2007 Feb 25.

Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Discovering common stem-loop motifs in unaligned RNA sequences.在未比对的RNA序列中发现常见的茎环基序。

Nucleic Acids Res. 2001 May 15;29(10):2135-44. doi: 10.1093/nar/29.10.2135.

SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments.SCARNA：通过匹配固定长度的茎片段实现RNA序列的快速准确结构比对。

Bioinformatics. 2006 Jul 15;22(14):1723-9. doi: 10.1093/bioinformatics/btl177. Epub 2006 May 11.

引用本文的文献

RNA motif discovery: a computational overview.RNA 基序发现：计算概述

Biol Direct. 2015 Oct 9;10:61. doi: 10.1186/s13062-015-0090-5.

Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix.通过修剪动态规划矩阵实现快速成对结构RNA比对。

PLoS Comput Biol. 2007 Oct;3(10):1896-908. doi: 10.1371/journal.pcbi.0030193. Epub 2007 Aug 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于更快的结构RNA茎环搜索的迷你贪心算法。

A mini-greedy algorithm for faster structural RNA stem-loop search.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献