Karan Aayush, Rivas Elena
Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
bioRxiv. 2024 Dec 20:2024.12.17.628809. doi: 10.1101/2024.12.17.628809.
Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. We further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly non-covarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over fifty known RNA motifs (). Motifs can appear in any non-helical loop region (including 3-way, 4-way and higher junctions) (). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (--). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. Furthermore, CaCoFold-R3D is fast and easily customizable for novel motif discovery.
结构性RNA呈现出大量反复出现的短三维元件,这些元件涉及非沃森-克里克相互作用,有助于将典型的双螺旋排列成三级结构。我们提出了CaCoFold-R3D,这是一种概率语法,可在序列或比对上联合预测这些RNA三维基序(也称为模块)以及RNA二级结构。CaCoFold-R3D利用RNA比对中存在的进化信息,通过共变可靠地识别典型螺旋(包括假结)。我们进一步引入了R3D语法,该语法也利用螺旋共变来约束大多不共变的RNA三维基序的定位。我们的方法在五十多个已知RNA基序的几乎详尽的列表上运行预测。基序可以出现在任何非螺旋环区域(包括三岔、四岔及更高的连接点)。所有结构基序以及典型螺旋都被排列成由单个联合概率语法预测的单一结构。我们的结果表明,CaCoFold-R3D是预测RNA三维结构中存在的全残基相互作用的有效替代方法。此外,CaCoFold-R3D速度快且易于定制以发现新的基序。