Jabbari Hosna, Condon Anne, Zhao Shelly
Department of Computer Science, University of British Columbia, Vancouver, British Columbia, Canada.
J Comput Biol. 2008 Mar;15(2):139-63. doi: 10.1089/cmb.2007.0198.
Algorithms for prediction of RNA secondary structure-the set of base pairs that form when an RNA molecule folds-are valuable to biologists who aim to understand RNA structure and function. Improving the accuracy and efficiency of prediction methods is an ongoing challenge, particularly for pseudoknotted secondary structures, in which base pairs overlap. This challenge is biologically important, since pseudoknotted structures play essential roles in functions of many RNA molecules, such as splicing and ribosomal frameshifting. State-of-the-art methods, which are based on free energy minimization, have high run-time complexity (typically Theta(n(5)) or worse), and can handle (minimize over) only limited types of pseudoknotted structures. We propose a new approach for prediction of pseudoknotted structures, motivated by the hypothesis that RNA structures fold hierarchically, with pseudoknot-free (non-overlapping) base pairs forming first, and pseudoknots forming later so as to minimize energy relative to the folded pseudoknot-free structure. Our HFold algorithm uses two-phase energy minimization to predict hierarchically formed secondary structures in O(n(3)) time, matching the complexity of the best algorithms for pseudoknot-free secondary structure prediction via energy minimization. Our algorithm can handle a wide range of biological structures, including kissing hairpins and nested kissing hairpins, which have previously required Theta(n(6)) time.
RNA二级结构预测算法——RNA分子折叠时形成的碱基对集合——对于旨在理解RNA结构和功能的生物学家来说非常有价值。提高预测方法的准确性和效率是一个持续存在的挑战,特别是对于碱基对重叠的假结二级结构而言。这一挑战在生物学上很重要,因为假结结构在许多RNA分子的功能中发挥着重要作用,如剪接和核糖体移码。基于自由能最小化的现有方法具有很高的运行时复杂度(通常为Theta(n(5))或更糟),并且只能处理(最小化)有限类型的假结结构。我们提出了一种预测假结结构的新方法,其动机基于这样的假设:RNA结构分层折叠,首先形成无假结(不重叠)的碱基对,然后形成假结,以便相对于折叠后的无假结结构使能量最小化。我们的HFold算法使用两阶段能量最小化来在O(n(3))时间内预测分层形成的二级结构,与通过能量最小化预测无假结二级结构的最佳算法的复杂度相匹配。我们的算法可以处理广泛的生物结构,包括之前需要Theta(n(6))时间的亲吻发夹和嵌套亲吻发夹。