Department of Computer Science and Engineering, Michigan State University, Michigan, USA.
BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-14-S2-S1. Epub 2013 Jan 21.
Accurate secondary structure prediction provides important information to undefirstafinding the tertiary structures and thus the functions of ncRNAs. However, the accuracy of the native structure derivation of ncRNAs is still not satisfactory, especially on sequences containing pseudoknots. It is recently shown that using the abstract shapes, which retain adjacency and nesting of structural features but disregard the length details of helix and loop regions, can improve the performance of structure prediction. In this work, we use SVM-based feature selection to derive the consensus abstract shape of homologous ncRNAs and apply the predicted shape to structure prediction including pseudoknots.
Our approach was applied to predict shapes and secondary structures on hundreds of ncRNA data sets with and without psuedoknots. The experimental results show that we can achieve 18% higher accuracy in shape prediction than the state-of-the-art consensus shape prediction tools. Using predicted shapes in structure prediction allows us to achieve approximate 29% higher sensitivity and 10% higher positive predictive value than other pseudoknot prediction tools.
Extensive analysis of RNA properties based on SVM allows us to identify important properties of sequences and structures related to their shapes. The combination of mass data analysis and SVM-based feature selection makes our approach a promising method for shape and structure prediction. The implemented tools, Knot Shape and Knot Structure are open source software and can be downloaded at: http://www.cse.msu.edu/~achawana/KnotShape.
准确的二级结构预测为发现 ncRNAs 的三级结构和功能提供了重要信息。然而,ncRNAs 天然结构的准确性仍然不能令人满意,尤其是在包含假结的序列上。最近的研究表明,使用保留结构特征的邻接性和嵌套性但忽略螺旋和环区长度细节的抽象形状可以提高结构预测的性能。在这项工作中,我们使用基于 SVM 的特征选择来推导同源 ncRNAs 的共识抽象形状,并将预测的形状应用于包括假结在内的结构预测。
我们的方法应用于具有和不具有假结的数百个 ncRNA 数据集的形状和二级结构预测。实验结果表明,我们可以在形状预测方面比最先进的共识形状预测工具提高 18%的准确性。在结构预测中使用预测的形状可以使我们的灵敏度提高约 29%,阳性预测值提高 10%,优于其他假结预测工具。
基于 SVM 的大量 RNA 性质分析使我们能够识别与其形状相关的序列和结构的重要性质。大规模数据分析和基于 SVM 的特征选择的结合使我们的方法成为形状和结构预测的一种很有前途的方法。实现的工具 Knot Shape 和 Knot Structure 是开源软件,可以在以下网址下载:http://www.cse.msu.edu/~achawana/KnotShape。