Center for Computational Natural Sciences and Bioinformatics (CCNSB), International Institute of Information Technology (IIIT-H), Gachibowli, Hyderabad 500032, India.
Computational Science Division, Saha Institute of Nuclear Physics (SINP), 1/AF, Bidhannagar, Kolkata 700064, India.
RNA. 2019 May;25(5):573-589. doi: 10.1261/rna.068551.118. Epub 2019 Feb 21.
Identification and characterization of base-multiplets, which are essentially mediated by base-pairing interactions, can provide insights into the diversity in the structure and dynamics of complex functional RNAs, and thus facilitate hypothesis driven biological research. The necessary nomenclature scheme, an extension of the geometric classification scheme for base-pairs by Leontis and Westhof, is however available only for base-triplets. In the absence of information on topology, this scheme is not applicable to quartets and higher order multiplets. Here we propose a topology-based classification scheme which, in conjunction with a graph-based algorithm, can be used for the automated identification and characterization of higher order base-multiplets in RNA structures. Here, the RNA structure is represented as a graph, where nodes represent nucleotides and edges represent base-pairing connectivity. Sets of connected components (of n nodes) within these graphs constitute subgraphs representing multiplets of "n" nucleotides. The different topological variants of the RNA multiplets thus correspond to different nonisomorphic forms of these subgraphs. To annotate RNA base-multiplets unambiguously, we propose a set of topology-based nomenclature rules for quartets, which are extendable to higher multiplets. We also demonstrate the utility of our approach toward the identification and annotation of higher order RNA multiplets, by investigating the occurrence contexts of selected examples in order to gain insights regarding their probable functional roles.
碱基多联体的鉴定和特征分析,主要通过碱基配对相互作用来实现,可以深入了解复杂功能 RNA 的结构和动力学多样性,从而促进基于假设的生物研究。然而,这种必要的命名方案只是碱基三联体的 Leontis 和 Westhof 碱基对几何分类方案的扩展。在缺乏拓扑信息的情况下,该方案不适用于四联体和更高阶的多联体。在这里,我们提出了一种基于拓扑的分类方案,结合基于图的算法,可以用于自动识别和描述 RNA 结构中更高阶的碱基多联体。在这里,RNA 结构表示为一个图,其中节点表示核苷酸,边表示碱基配对连接。这些图中的连通分量(n 个节点)集构成了表示“n”个核苷酸的多联体的子图。因此,RNA 多联体的不同拓扑变体对应于这些子图的不同非同构形式。为了明确地注释 RNA 碱基多联体,我们提出了一套用于四联体的基于拓扑的命名规则,该规则可扩展到更高阶的多联体。我们还通过研究选定示例的出现上下文,展示了我们的方法在鉴定和注释更高阶 RNA 多联体方面的实用性,以深入了解它们可能的功能作用。