Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA.
Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.
Nucleic Acids Res. 2018 Jan 9;46(1):314-323. doi: 10.1093/nar/gkx1057.
RNA secondary structure prediction is widely used for developing hypotheses about the structures of RNA sequences, and structure can provide insight about RNA function. The accuracy of structure prediction is known to be improved using experimental mapping data that provide information about the pairing status of single nucleotides, and these data can now be acquired for whole transcriptomes using high-throughput sequencing. Prior methods for using these experimental data focused on predicting structures for sequences assuming that they populate a single structure. Most RNAs populate multiple structures, however, where the ensemble of strands populates structures with different sets of canonical base pairs. The focus on modeling single structures has been a bottleneck for accurately modeling RNA structure. In this work, we introduce Rsample, an algorithm for using experimental data to predict more than one RNA structure for sequences that populate multiple structures at equilibrium. We demonstrate, using SHAPE mapping data, that we can accurately model RNA sequences that populate multiple structures, including the relative probabilities of those structures. This program is freely available as part of the RNAstructure software package.
RNA 二级结构预测被广泛用于提出关于 RNA 序列结构的假说,而结构可以提供有关 RNA 功能的见解。已知使用提供有关单核苷酸配对状态信息的实验映射数据可以提高结构预测的准确性,现在可以使用高通量测序为整个转录组获取这些数据。以前使用这些实验数据的方法侧重于预测假定它们只存在于一种结构中的序列的结构。然而,大多数 RNA 存在于多个结构中,其中链的整体存在于具有不同组规范碱基对的结构中。专注于建模单个结构一直是准确建模 RNA 结构的瓶颈。在这项工作中,我们引入了 Rsample,这是一种用于使用实验数据预测在平衡时存在多个结构的序列的多个 RNA 结构的算法。我们使用 SHAPE 映射数据证明,我们可以准确地对存在多个结构的 RNA 序列进行建模,包括这些结构的相对概率。该程序作为 RNAstructure 软件包的一部分免费提供。