Kiryu Hisanori, Kin Taishin, Asai Kiyoshi
Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-42 Aomi, Koto-ku, Tokyo, Japan.
Bioinformatics. 2008 Feb 1;24(3):367-73. doi: 10.1093/bioinformatics/btm591. Epub 2007 Dec 4.
Base pairing probability matrices have been frequently used for the analyses of structural RNA sequences. Recently, there has been a growing need for computing these probabilities for long DNA sequences by constraining the maximal span of base pairs to a limited value. However, none of the existing programs can exactly compute the base pairing probabilities associated with the energy model of secondary structures under such a constraint.
We present an algorithm that exactly computes the base pairing probabilities associated with the energy model under the constraint on the maximal span W of base pairs. The complexity of our algorithm is given by O(NW2) in time and O(N+W2) in memory, where N is the sequence length. We show that our algorithm has a higher sensitivity to the true base pairs as compared to that of RNAplfold. We also present an algorithm that predicts a mutually consistent set of local secondary structures by maximizing the expected accuracy function. The comparison of the local secondary structure predictions with those of RNALfold indicates that our algorithm is more accurate. Our algorithms are implemented in the software named 'Rfold.'
The C++ source code of the Rfold software and the test dataset used in this study are available at http://www.ncrna.org/software/Rfold/.
碱基配对概率矩阵已被频繁用于分析结构性RNA序列。最近,通过将碱基对的最大跨度限制在一个有限值来计算长DNA序列的这些概率的需求日益增长。然而,现有的程序都无法在这种约束下精确计算与二级结构能量模型相关的碱基配对概率。
我们提出了一种算法,该算法能在碱基对最大跨度W的约束下精确计算与能量模型相关的碱基配对概率。我们算法的时间复杂度为O(NW²),内存复杂度为O(N + W²),其中N是序列长度。我们表明,与RNAplfold相比,我们的算法对真实碱基对具有更高的敏感性。我们还提出了一种算法,通过最大化期望准确度函数来预测一组相互一致的局部二级结构。将局部二级结构预测结果与RNALfold的结果进行比较表明,我们的算法更准确。我们的算法在名为“Rfold”的软件中实现。
Rfold软件的C++源代码以及本研究中使用的测试数据集可在http://www.ncrna.org/software/Rfold/获取。