School of Computer Science, McGill University, Montreal, QC H3A 0E9, Canada.
Center for Soft and Living Matter, Institute for Basic Science, Ulsan 44919, Republic of Korea.
Nucleic Acids Res. 2019 Apr 23;47(7):3321-3332. doi: 10.1093/nar/gkz102.
RNA structures possess multiple levels of structural organization. A secondary structure, made of Watson-Crick helices connected by loops, forms a scaffold for the tertiary structure. The 3D structures adopted by these loops are therefore critical determinants shaping the global 3D architecture. Earlier studies showed that these local 3D structures can be described as conserved sets of ordered non-Watson-Crick base pairs called RNA structural modules. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in the module databases. We present BayesPairing, an automated, efficient and customizable tool for (i) building Bayesian networks representing RNA 3D modules and (ii) rapid identification of 3D modules in sequences. BayesPairing uses a flexible definition of RNA 3D modules that allows us to consider complex architectures such as multi-branched loops and features multiple algorithmic improvements. We benchmarked our methods using cross-validation techniques on 3409 RNA chains and show that BayesPairing achieves up to ∼70% identification accuracy on module positions and base pair interactions. BayesPairing can handle a broader range of motifs (versatility) and offers considerable running time improvements (efficiency), opening the door to a broad range of large-scale applications.
RNA 结构具有多层次的结构组织。由沃森-克里克(Watson-Crick)氢键连接的环组成的二级结构为三级结构形成支架。因此,这些环的 3D 结构是决定全局 3D 结构的关键因素。早期的研究表明,这些局部 3D 结构可以描述为一组保守的有序非沃森-克里克碱基对,称为 RNA 结构模块。不幸的是,当前 3D 模块识别方法的计算效率和范围还太有限,无法利用模块数据库中积累的所有知识。我们提出了 BayesPairing,这是一种自动化、高效且可定制的工具,用于 (i) 构建表示 RNA 3D 模块的贝叶斯网络,以及 (ii) 在序列中快速识别 3D 模块。BayesPairing 使用灵活的 RNA 3D 模块定义,允许我们考虑复杂的结构,如多分支环,并具有多种算法改进。我们使用 3409 个 RNA 链的交叉验证技术对我们的方法进行了基准测试,结果表明,BayesPairing 在模块位置和碱基对相互作用的识别准确性高达约 70%。BayesPairing 可以处理更广泛的基序(多功能性),并提供相当大的运行时间改进(效率),为广泛的大规模应用打开了大门。