Department of Mathematics and Computer Science, Denison University, Granville, Ohio, United States of America.
PLoS One. 2011 Jan 28;6(1):e16178. doi: 10.1371/journal.pone.0016178.
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in O(n3) time and O(n2) space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures--indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.
如果没有通过添加或删除单个碱基对可以获得更低能量的结构,那么 RNA 的二级结构就是局部最优的,其中能量是根据广泛接受的 Turner 最近邻模型定义的。局部最优结构形成动力学陷阱,因为任何远离局部最优结构的进化都必须涉及能量不利的折叠步骤。在这里,我们提出了一种新颖、高效的算法,用于计算给定 RNA 序列的所有局部最优二级结构的配分函数。我们的软件 RNAlocopt 在 O(n3)时间和 O(n2)空间中运行。此外,RNAlocopt 从所有局部最优结构的 Boltzmann 子集中采样用户指定数量的结构。我们应用 RNAlocopt 来表明:(1) 局部最优结构的数量远远少于结构的总数——实际上,局部最优结构的数量大约等于所有结构数量的平方根,(2) 这个子集中的结构多样性可能与整个 Boltzmann 集合的结构多样性相似,也可能截然不同,这种情况取决于输入 RNA 的类型,(3) 考虑到局部最优结构的碱基配对频率,计算出的(修改后的)最大期望准确性结构比其他当前基于热力学的方法更能准确预测天然结构。软件 RNAlocopt 在我们对 RNA 二级结构折叠景观的研究中是一个技术突破。对于以前使用涉及穷举的方法不可能处理的长 RNA 序列,现在可以快速生成局部最优结构(在 Turner 能量模型中的动力学陷阱)。使用局部最优结构可以实现最先进的二级结构预测,其基准是涉及最小自由能和最大期望准确性计算的方法。Web 服务器和源代码可在 http://bioinformatics.bc.edu/clotelab/RNAlocopt/ 获得。