Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan.
J Comput Biol. 2020 Sep;27(9):1443-1451. doi: 10.1089/cmb.2019.0512. Epub 2020 Feb 14.
Comparison of RNA structures is one of the most crucial analysis for elucidating their individual functions and promoting medical applications. Because it is widely accepted that their functions and structures are strongly correlated, various methods for RNA secondary structure analysis have been proposed owing to the difficulty in predicting RNA three-dimensional structure directly from its sequence. However, there are few methods dealing with RNA secondary structures with a specific and complex partial structure called pseudoknot despite its significance to biological process, which is a big obstacle for analyzing their functions. In this study, we propose a novel tree representation of pseudoknotted RNA secondary structures by topological centroid identification and their comparison methods based on the tree edit distance. In the proposed method, a given graph representing an RNA secondary structure is transformed to a tree rooted at one of the vertices constituting the topological centroid that is identified by removing cycles with peeling processing for the graph. When comparing tree-represented RNA secondary structures collected from a public database using the tree edit distance and functional gene groups defined by Gene Ontology (GO), the proposed method showed better clustering results according to their GOs than canonical RNA sequence-based comparison. In addition, we also report a case that the combination of the tree edit distance and the sequence edit distance shows a better classification of the pseudoknotted RNA secondary structures.
RNA 结构的比较是阐明其各自功能并促进医学应用的最关键分析之一。由于普遍认为它们的功能和结构密切相关,因此已经提出了各种用于 RNA 二级结构分析的方法,因为直接从序列预测 RNA 三维结构具有一定难度。然而,尽管具有特定和复杂的部分结构(称为假结)对生物过程很重要,但很少有方法可以处理这种结构,这是分析其功能的一个主要障碍。在这项研究中,我们通过拓扑质心识别和基于树编辑距离的比较方法,提出了一种新的伪结 RNA 二级结构的树表示形式。在提出的方法中,通过对图进行去环处理来识别拓扑质心,然后将表示 RNA 二级结构的给定图转换为以构成拓扑质心的一个顶点为根的树。使用树编辑距离和基因本体论 (GO) 定义的功能基因组比较从公共数据库中收集的基于树表示的 RNA 二级结构时,与基于 RNA 序列的经典比较相比,该方法根据其 GO 显示出更好的聚类结果。此外,我们还报告了一个案例,即树编辑距离和序列编辑距离的组合可以更好地对伪结 RNA 二级结构进行分类。