Liu Jianghui, Wang Jason T L, Hu Jun, Tian Bin
Department of Biochemistry and Molecular Biology, New Jersey Medical School, University of Medicine and Dentistry of New Jersey, Newark, NJ 07101, USA.
BMC Bioinformatics. 2005 Apr 7;6:89. doi: 10.1186/1471-2105-6-89.
Alignment of RNA secondary structures is important in studying functional RNA motifs. In recent years, much progress has been made in RNA motif finding and structure alignment. However, existing tools either require a large number of prealigned structures or suffer from high time complexities. This makes it difficult for the tools to process RNAs whose prealigned structures are unavailable or process very large RNA structure databases.
We present here an efficient tool called RSmatch for aligning RNA secondary structures and for motif detection. Motivated by widely used algorithms for RNA folding, we decompose an RNA secondary structure into a set of atomic structure components that are further organized by a tree model to capture the structural particularities. RSmatch can find the optimal global or local alignment between two RNA secondary structures using two scoring matrices, one for single-stranded regions and the other for double-stranded regions. The time complexity of RSmatch is O(mn) where m is the size of the query structure and n that of the subject structure. When applied to searching a structure database, RSmatch can find similar RNA substructures, and is capable of conducting multiple structure alignment and iterative database search. Therefore it can be used to identify functional RNA motifs. The accuracy of RSmatch is tested by experiments using a number of known RNA structures, including simple stem-loops and complex structures containing junctions.
With respect to computing efficiency and accuracy, RSmatch compares favorably with other tools for RNA structure alignment and motif detection. This tool shall be useful to researchers interested in comparing RNA structures obtained from wet lab experiments or RNA folding programs, particularly when the size of the structure dataset is large.
RNA二级结构的比对对于研究功能性RNA基序很重要。近年来,在RNA基序发现和结构比对方面取得了很大进展。然而,现有的工具要么需要大量预先比对好的结构,要么存在很高的时间复杂度。这使得这些工具难以处理没有预先比对好结构的RNA,或者难以处理非常大的RNA结构数据库。
我们在此展示了一个名为RSmatch的高效工具,用于RNA二级结构的比对和基序检测。受广泛使用的RNA折叠算法的启发,我们将RNA二级结构分解为一组原子结构组件,这些组件通过树模型进一步组织以捕获结构特殊性。RSmatch可以使用两个评分矩阵找到两个RNA二级结构之间的最优全局或局部比对,一个用于单链区域,另一个用于双链区域。RSmatch的时间复杂度为O(mn),其中m是查询结构的大小,n是目标结构的大小。当应用于搜索结构数据库时,RSmatch可以找到相似的RNA子结构,并且能够进行多结构比对和迭代数据库搜索。因此它可用于识别功能性RNA基序。使用许多已知的RNA结构(包括简单的茎环结构和包含连接点的复杂结构)通过实验测试了RSmatch的准确性。
在计算效率和准确性方面,RSmatch与其他用于RNA结构比对和基序检测的工具相比具有优势。该工具对于有兴趣比较从湿实验室实验或RNA折叠程序获得的RNA结构的研究人员将很有用,特别是当结构数据集规模较大时。