Sperschneider Jana, Datta Amitava
School of Computer Science and Software Engineering, University of Western Australia, Perth, WA 6009, Australia.
RNA. 2008 Apr;14(4):630-40. doi: 10.1261/rna.968808. Epub 2008 Feb 26.
Pseudoknots are folded structures in RNA molecules that perform essential functions as part of cellular transcription machinery and regulatory processes. The prediction of these structures in RNA molecules has important implications in antiviral drug design. It has been shown that the prediction of pseudoknots is an NP-complete problem. Practical structure prediction algorithms based on free energy minimization employ a restricted problem class and dynamic programming. However, these algorithms are computationally very expensive, and their accuracy deteriorates if the input sequence containing the pseudoknot is too long. Heuristic methods can be more efficient, but do not guarantee an optimal solution in regards to the minimum free energy model. We present KnotSeeker, a new heuristic algorithm for the detection of pseudoknots in RNA sequences as a preliminary step for structure prediction. Our method uses a hybrid sequence matching and free energy minimization approach to perform a screening of the primary sequence. We select short sequence fragments as possible candidates that may contain pseudoknots and verify them by using an existing dynamic programming algorithm and a minimum weight independent set calculation. KnotSeeker is significantly more accurate in detecting pseudoknots compared to other common methods as reported in the literature. It is very efficient and therefore a practical tool, especially for long sequences. The algorithm has been implemented in Python and it also uses C/C++ code from several other known techniques. The code is available from http://www.csse.uwa.edu.au/~datta/pseudoknot.
假结是RNA分子中的折叠结构,作为细胞转录机制和调控过程的一部分发挥着重要作用。预测RNA分子中的这些结构在抗病毒药物设计中具有重要意义。研究表明,假结预测是一个NP完全问题。基于自由能最小化的实用结构预测算法采用了受限的问题类别和动态规划。然而,这些算法在计算上非常昂贵,并且如果包含假结的输入序列过长,其准确性会下降。启发式方法可能更有效,但不能保证在最小自由能模型方面得到最优解。我们提出了KnotSeeker,一种用于检测RNA序列中假结的新启发式算法,作为结构预测的初步步骤。我们的方法使用混合序列匹配和自由能最小化方法对一级序列进行筛选。我们选择可能包含假结的短序列片段作为候选对象,并使用现有的动态规划算法和最小权重独立集计算对其进行验证。与文献中报道的其他常用方法相比,KnotSeeker在检测假结方面的准确性要高得多。它非常高效,因此是一个实用工具,尤其适用于长序列。该算法已用Python实现,并且还使用了其他几种已知技术的C/C++代码。代码可从http://www.csse.uwa.edu.au/~datta/pseudoknot获取。