Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA.
Computer Science Department, College of Staten Island, City University of New York, Staten Island, New York, NY 10314, USA.
Methods. 2019 Jun 1;162-163:74-84. doi: 10.1016/j.ymeth.2019.03.022. Epub 2019 Mar 27.
Exploring novel RNA topologies is imperative for understanding RNA structure and pursuing its design. Our RNA-As-Graphs (RAG) approach exploits graph theory tools and uses coarse-grained tree and dual graphs to represent RNA helices and loops by vertices and edges. Only dual graphs represent pseudoknotted RNAs fully. Here we develop a dual graph enumeration algorithm to generate an expanded library of dual graph topologies for 2-9 vertices, and extend our dual graph partitioning algorithm to identify all possible RNA subgraphs. Our enumeration algorithm connects smaller-vertex graphs, using all possible edge combinations, to build larger-vertex graphs and retain all non-isomorphic graph topologies, thereby more than doubling the size of our prior library to a total of 110,667 dual graph topologies. We apply our dual graph partitioning algorithm, which keeps pseudoknots and junctions intact, to all existing RNA structures to identify all possible substructures up to 9 vertices. In addition, our expanded dual graph library assigns graph topologies to all RNA graphs and subgraphs, rectifying prior inconsistencies. We update our RAG-3Dual database of RNA atomic fragments with all newly identified substructures and their graph IDs, increasing its size by more than 50 times. The enlarged dual graph library and RAG-3Dual database provide a comprehensive repertoire of graph topologies and atomic fragments to study yet undiscovered RNA molecules and design RNA sequences with novel topologies, including a variety of pseudoknotted RNAs.
探索新的 RNA 拓扑结构对于理解 RNA 结构和追求其设计至关重要。我们的 RNA 图(RAG)方法利用图论工具,并使用粗粒度的树和对偶图通过顶点和边来表示 RNA 螺旋和环。只有对偶图才能完全表示伪结 RNA。在这里,我们开发了对偶图枚举算法来生成 2-9 个顶点的对偶图拓扑的扩展库,并扩展了我们的对偶图分区算法来识别所有可能的 RNA 子图。我们的枚举算法通过使用所有可能的边组合连接较小顶点的图,来构建更大顶点的图,并保留所有非同构的图拓扑,从而将我们之前的库的大小增加了一倍以上,总共达到 110667 种对偶图拓扑。我们应用我们的对偶图分区算法,它保持伪结和连接点的完整性,来识别所有现有的 RNA 结构中最多 9 个顶点的所有可能的子结构。此外,我们扩展的对偶图库将图拓扑分配给所有 RNA 图和子图,纠正了之前的不一致。我们用所有新识别的子结构及其图 ID 更新了 RAG-3Dual RNA 原子片段数据库,使其大小增加了 50 多倍。扩大的对偶图库和 RAG-3Dual 数据库提供了一个全面的图拓扑和原子片段库,用于研究尚未发现的 RNA 分子,并设计具有新拓扑的 RNA 序列,包括各种伪结 RNA。