Jin Emma Yu, Nebel Markus E
University of Kaiserslautern, Kaiserslautern, Germany.
University of Southern Denmark, Odense, Denmark.
J Math Biol. 2016 Feb;72(3):527-71. doi: 10.1007/s00285-015-0894-z. Epub 2015 May 23.
Various tools used to predict the secondary structure for a given RNA sequence are based on dynamic programming used to compute a conformation of minimum free energy. For structures without pseudoknots, a worst-case runtime proportional to n3, with n being the length of the sequence, results since a table of dimension n2 has to be filled in while a single entry gives rise to a linear computational effort. However, it was recently observed that reformulating the corresponding dynamic programming recursion together with the bookkeeping of potential folding alternatives (a technique called sparsification) may reduce the runtime to n2 on average, assuming that nucleotides of distance d form a hydrogen bond (i..e., are paired) with probability b/d(c) for some constants b > 0, c > 1. The latter is called the polymer-zeta model and plays a crucial role in speeding up the above mentioned algorithm. In this paper we discuss the application of the polymer-zeta property for the analysis of sparsification, showing that it must be applied conditionally on first and last positions to pair. Afterwards, we will investigate the combinatorics of RNA secondary structures assuming that the corresponding conditional probabilities behave according to a polymer-zeta probability model. We show that even if some of the structural parameters exhibit an almost realistic behavior on average, the expected shape of a folding in that model must be assumed to highly differ from those observed in nature. More precisely, we prove our polymer-zeta model to be appropriate for mRNA molecules but to fail in connection with almost every other family of RNA. Those findings explain the huge speedup of the dynamic programming algorithm observed empirically by Wexler et al. when applying sparsification in connection with mRNA data.
用于预测给定RNA序列二级结构的各种工具都是基于动态规划,用于计算自由能最小的构象。对于没有假结的结构,由于要填充一个n2维的表格,而单个条目会产生线性计算量,因此最坏情况下的运行时间与n3成正比,其中n是序列的长度。然而,最近有人观察到,重新构建相应的动态规划递归以及记录潜在的折叠替代方案(一种称为稀疏化的技术),平均而言可以将运行时间减少到n2,假设距离为d的核苷酸以概率b/d(c)形成氢键(即配对),其中b>0,c>1为一些常数。后者被称为聚合物zeta模型,在加速上述算法中起着关键作用。在本文中,我们讨论了聚合物zeta性质在稀疏化分析中的应用,表明它必须在配对的第一个和最后一个位置有条件地应用。之后,我们将研究RNA二级结构的组合学,假设相应的条件概率符合聚合物zeta概率模型。我们表明,即使一些结构参数平均表现出几乎逼真的行为,该模型中折叠的预期形状也必须假定与自然界中观察到的形状有很大不同。更确切地说,我们证明我们的聚合物zeta模型适用于mRNA分子,但与几乎所有其他RNA家族结合时都会失败。这些发现解释了Wexler等人在将稀疏化与mRNA数据结合应用时通过经验观察到的动态规划算法的巨大加速。