Department of Computer Science and Centre for Biological Signalling Studies (BIOSS), Albert-Ludwigs-Universität Freiburg, Germany.
Nucleic Acids Res. 2012 Jul;40(12):5215-26. doi: 10.1093/nar/gks181. Epub 2012 Feb 28.
Determining the structural properties of mRNA is key to understanding vital post-transcriptional processes. As experimental data on mRNA structure are scarce, accurate structure prediction is required to characterize RNA regulatory mechanisms. Although various structure prediction approaches are available, it is often unclear which to choose and how to set their parameters. Furthermore, no standard measure to compare predictions of local structure exists. We assessed the performance of different methods using two types of data: transcriptome-wide enzymatic probing information and a large, curated set of cis-regulatory elements. To compare the approaches, we introduced structure accuracy, a measure that is applicable to both global and local methods. Our results showed that local folding was more accurate than the classic global approach. We investigated how the locality parameters, maximum base pair span and window size, influenced the prediction performance. A span of 150 provided a reasonable balance between maximizing the number of accurately predicted base pairs, while minimizing effects of incorrect long-range predictions. We characterized the error at artificial sequence ends, which we reduced by setting the window size sufficiently greater than the maximum span. Our method, LocalFold, diminished all border effects and produced the most robust performance.
确定 mRNA 的结构特性是理解重要的转录后过程的关键。由于关于 mRNA 结构的实验数据很少,因此需要进行准确的结构预测来描述 RNA 调节机制。尽管有多种结构预测方法,但通常不清楚应该选择哪种方法以及如何设置其参数。此外,目前还没有用于比较局部结构预测的标准方法。我们使用两种类型的数据评估了不同方法的性能:全转录组酶探测信息和大型、经过精心整理的顺式调控元件集。为了比较这些方法,我们引入了结构准确性,这是一种适用于全局和局部方法的度量。我们的结果表明,局部折叠比经典的全局方法更准确。我们研究了局部参数(最大碱基对跨度和窗口大小)如何影响预测性能。跨度为 150 在最大化准确预测碱基对的数量和最小化错误的长程预测的影响之间取得了合理的平衡。我们还对人工序列末端的误差进行了特征描述,并通过将窗口大小设置得足够大于最大跨度来减少误差。我们的方法 LocalFold 减少了所有边界效应,表现出最稳健的性能。