Zirbel Craig L, Roll James, Sweeney Blake A, Petrov Anton I, Pirrung Meg, Leontis Neocles B
Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA
Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA.
Nucleic Acids Res. 2015 Sep 3;43(15):7504-20. doi: 10.1093/nar/gkv651. Epub 2015 Jun 29.
Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson-Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download.
从序列预测RNA三维结构是生物物理学中的一项重大挑战。一个重要的子目标是从从二级结构(二维)图中提取的RNA内部环和发夹环序列中准确识别重复的三维基序。我们基于混合随机上下文无关文法和马尔可夫随机场(SCFG/MRF)开发并验证了用于三维基序序列的新概率模型。SCFG/MRF模型是使用原子分辨率的RNA三维结构构建的。为了对每个模型进行参数化,我们使用了在RNA三维基序图谱中发现的每个基序的所有实例以及由FR3D软件生成的成对核苷酸相互作用的注释。非沃森-克里克碱基对之间的等排关系用于对序列变体进行评分。SCFG技术对嵌套对和插入进行建模,而MRF思想处理交叉相互作用和碱基三联体。我们使用随机生成的序列测试集为每个基序组设置接受和拒绝阈值,从而控制误报率。通过将四个基序组的结果与RMDetect进行比较来进行验证。为序列评分开发的软件(JAR3D)的结构设计为,当新结构被解析时,随着新基序在RNA三维基序图谱中不断积累,自动纳入新基序,并且可免费下载。