一种用于预测含假结的RNA二级结构的迭代循环匹配方法。

An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots.

作者信息

Ruan Jianhua, Stormo Gary D, Zhang Weixiong

机构信息

Department of Computer Science, Washington University in St. Louis, St. Louis, MO 63130, USA.

出版信息

Bioinformatics. 2004 Jan 1;20(1):58-66. doi: 10.1093/bioinformatics/btg373.

DOI:10.1093/bioinformatics/btg373

PMID:14693809

Abstract

MOTIVATION

Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to its difficulty in modeling. Although, several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching, an algorithm for pseudoknot prediction with comparative analysis, suffers from low-prediction accuracy in many cases.

RESULTS

Here we present an algorithm, iterated loop matching, for reliably and efficiently predicting RNA secondary structures including pseudoknots. The method can utilize either thermodynamic or comparative information or both, thus is able to predict pseudoknots for both aligned and individual sequences. We have tested the algorithm on a number of RNA families. Using 8-12 homologous sequences, the algorithm correctly identifies more than 90% of base-pairs for short sequences and 80% overall. It correctly predicts nearly all pseudoknots and produces very few spurious base-pairs for sequences without pseudoknots. Comparisons show that our algorithm is both more sensitive and more specific than the maximum weighted matching method. In addition, our algorithm has high-prediction accuracy on individual sequences, comparable with the PKNOTS algorithm, while using much less computational resources.

AVAILABILITY

The program has been implemented in ANSI C and is freely available for academic use at http://www.cse.wustl.edu/~zhang/projects/rna/ilm/

SUPPLEMENTARY INFORMATION

http://www.cse.wustl.edu/~zhang/projects/rna/ilm/

摘要

动机

由于假结在建模方面存在困难，它通常被排除在RNA二级结构预测之外。尽管存在几种使用热力学方法预测假结的动态规划算法，但它们既不可靠也不高效。另一方面，比较方法更可靠，但通常是临时进行的，需要专家干预。最大加权匹配是一种用于假结预测的比较分析算法，在许多情况下预测准确率较低。

结果

在这里，我们提出了一种迭代环匹配算法，用于可靠且高效地预测包括假结在内的RNA二级结构。该方法可以利用热力学信息或比较信息或两者兼用，因此能够对比对序列和单个序列预测假结。我们在多个RNA家族上测试了该算法。使用8至12个同源序列，该算法对于短序列能正确识别超过90%的碱基对，总体识别率为80%。它能正确预测几乎所有的假结，对于没有假结的序列产生的错误碱基对很少。比较表明，我们的算法比最大加权匹配方法更灵敏且更具特异性。此外，我们的算法在单个序列上具有较高的预测准确率，与PKNOTS算法相当，同时使用的计算资源要少得多。