Rogers Emily, Murrugarra David, Heitsch Christine
School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia.
Department of Mathematics, University of Kentucky, Lexington, Kentucky.
Biophys J. 2017 Jul 25;113(2):321-329. doi: 10.1016/j.bpj.2017.05.026. Epub 2017 Jun 16.
Understanding how RNA secondary structure prediction methods depend on the underlying nearest-neighbor thermodynamic model remains a fundamental challenge in the field. Minimum free energy (MFE) predictions are known to be "ill conditioned" in that small changes to the thermodynamic model can result in significantly different optimal structures. Hence, the best practice is now to sample from the Boltzmann distribution, which generates a set of suboptimal structures. Although the structural signal of this Boltzmann sample is known to be robust to stochastic noise, the conditioning and robustness under thermodynamic perturbations have yet to be addressed. We present here a mathematically rigorous model for conditioning inspired by numerical analysis, and also a biologically inspired definition for robustness under thermodynamic perturbation. We demonstrate the strong correlation between conditioning and robustness and use its tight relationship to define quantitative thresholds for well versus ill conditioning. These resulting thresholds demonstrate that the majority of the sequences are at least sample robust, which verifies the assumption of sampling's improved conditioning over the MFE prediction. Furthermore, because we find no correlation between conditioning and MFE accuracy, the presence of both well- and ill-conditioned sequences indicates the continued need for both thermodynamic model refinements and alternate RNA structure prediction methods beyond the physics-based ones.
了解RNA二级结构预测方法如何依赖于基础的最近邻热力学模型仍然是该领域的一项基本挑战。已知最小自由能(MFE)预测是“病态的”,因为热力学模型的微小变化可能导致显著不同的最优结构。因此,目前的最佳做法是从玻尔兹曼分布中采样,该分布会生成一组次优结构。虽然已知这种玻尔兹曼样本的结构信号对随机噪声具有鲁棒性,但尚未解决热力学扰动下的条件性和鲁棒性问题。我们在此提出一个受数值分析启发的用于条件性的数学严格模型,以及一个受生物学启发的关于热力学扰动下鲁棒性的定义。我们证明了条件性和鲁棒性之间的强相关性,并利用它们的紧密关系来定义良好条件与病态条件的定量阈值。这些所得阈值表明大多数序列至少在样本上是鲁棒的,这验证了采样相对于MFE预测在改善条件性方面的假设。此外,由于我们发现条件性与MFE准确性之间没有相关性,良好条件和病态条件序列的存在表明持续需要对热力学模型进行改进,并需要超越基于物理方法的其他RNA结构预测方法。