Zhao Jizhen, Malmberg Russell L, Cai Liming
Department of Computer Science, University of Georgia, Athens, GA 30602, USA.
J Math Biol. 2008 Jan;56(1-2):145-59. doi: 10.1007/s00285-007-0124-4. Epub 2007 Sep 29.
The prediction of RNA secondary structure including pseudoknots remains a challenge due to the intractable computation of the sequence conformation from nucleotide interactions under free energy models. Optimal algorithms often assume a restricted class for the predicted RNA structures and yet still require a high-degree polynomial time complexity, which is too expensive to use. Heuristic methods may yield time-efficient algorithms but they do not guarantee optimality of the predicted structure. This paper introduces a new and efficient algorithm for the prediction of RNA structure with pseudoknots for which the structure is not restricted. Novel prediction techniques are developed based on graph tree decomposition. In particular, based on a simplified energy model, stem overlapping relationships are defined with a graph, in which a specialized maximum independent set corresponds to the desired optimal structure. Such a graph is tree decomposable; dynamic programming over a tree decomposition of the graph leads to an efficient optimal algorithm. The final structure predictions are then based on re-ranking a list of suboptimal structures under a more comprehensive free energy model. The new algorithm is evaluated on a large number of RNA sequence sets taken from diverse resources. It demonstrates overall sensitivity and specificity that outperforms or is comparable with those of previous optimal and heuristic algorithms yet it requires significantly less time than the compared optimal algorithms.
由于在自由能模型下,从核苷酸相互作用计算序列构象的计算难度大,预测包括假结在内的RNA二级结构仍然是一个挑战。最优算法通常对预测的RNA结构假设一个受限的类别,但仍然需要高次多项式时间复杂度,使用起来成本太高。启发式方法可能会产生时间高效的算法,但它们不能保证预测结构的最优性。本文介绍了一种新的高效算法,用于预测不受结构限制的带有假结的RNA结构。基于图树分解开发了新颖的预测技术。特别地,基于简化的能量模型,用一个图定义茎重叠关系,其中一个特殊的最大独立集对应于所需的最优结构。这样的图是可树分解的;在图的树分解上进行动态规划会产生一个高效的最优算法。然后基于在更全面的自由能模型下对次优结构列表进行重新排序来进行最终的结构预测。在从不同资源获取的大量RNA序列集上对新算法进行了评估。它展示出的整体敏感性和特异性优于或与先前的最优算法和启发式算法相当,但与所比较的最优算法相比,它所需的时间要少得多。