Suppr超能文献

Murlet:一种用于结构RNA序列的实用多序列比对工具。

Murlet: a practical multiple alignment tool for structural RNA sequences.

作者信息

Kiryu Hisanori, Tabei Yasuo, Kin Taishin, Asai Kiyoshi

机构信息

Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.

出版信息

Bioinformatics. 2007 Jul 1;23(13):1588-98. doi: 10.1093/bioinformatics/btm146. Epub 2007 Apr 25.

Abstract

MOTIVATION

Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computational cost of the Sankoff algorithm precludes its application to most RNA sequences.

RESULTS

We propose an efficient algorithm for the multiple alignment of structural RNA sequences. Our algorithm is a variant of the Sankoff algorithm, and it uses an efficient scoring system that reduces the time and space requirements considerably without compromising on the alignment quality. First, our algorithm computes the match probability matrix that measures the alignability of each position pair between sequences as well as the base pairing probability matrix for each sequence. These probabilities are then combined to score the alignment using the Sankoff algorithm. By itself, our algorithm does not predict the consensus secondary structure of the alignment but uses external programs for the prediction. We demonstrate that both the alignment quality and the accuracy of the consensus secondary structure prediction from our alignment are the highest among the other programs examined. We also demonstrate that our algorithm can align relatively long RNA sequences such as the eukaryotic-type signal recognition particle RNA that is approximately 300 nt in length; multiple alignment of such sequences has not been possible by using other Sankoff-based algorithms. The algorithm is implemented in the software named 'Murlet'.

AVAILABILITY

The C++ source code of the Murlet software and the test dataset used in this study are available at http://www.ncrna.org/papers/Murlet/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结构RNA基因呈现出独特的进化模式,旨在保守其二级结构;在构建RNA基因的精确多序列比对时应考虑这些模式。桑科夫算法是一种自然的比对算法,它在比对模型中纳入了碱基对共变的影响。然而,桑科夫算法极高的计算成本使其无法应用于大多数RNA序列。

结果

我们提出了一种用于结构RNA序列多序列比对的高效算法。我们的算法是桑科夫算法的一个变体,它使用了一种高效的评分系统,在不影响比对质量的情况下,大幅降低了时间和空间需求。首先,我们的算法计算匹配概率矩阵,该矩阵衡量序列间每个位置对的可比对性以及每个序列的碱基配对概率矩阵。然后,利用这些概率结合桑科夫算法对比对进行评分。我们的算法本身并不预测比对的共有二级结构,而是使用外部程序进行预测。我们证明,在其他所检测的程序中,我们的比对质量和共有二级结构预测的准确性都是最高的。我们还证明,我们的算法能够比对相对较长的RNA序列,如长度约为300 nt的真核型信号识别颗粒RNA;使用其他基于桑科夫算法的方法无法对这类序列进行多序列比对。该算法在名为“Murlet”的软件中实现。

可用性

Murlet软件的C++源代码以及本研究中使用的测试数据集可在http://www.ncrna.org/papers/Murlet/获取。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验