Krueger Ryan K, Aviran Sharon, Mathews David H, Zuber Jeffrey, Ward Max
ArXiv. 2025 May 12:arXiv:2503.09085v2.
The Nearest Neighbor model is the $\textit{de facto}$ thermodynamic model of RNA secondary structure formation and is a cornerstone of RNA structure prediction and sequence design. The current functional form (Turner 2004) contains $\approx13,000$ underlying thermodynamic parameters, and fitting these to both experimental and structural data is computationally challenging. Here, we leverage recent advances in $\textit{differentiable folding}$, a method for directly computing gradients of the RNA folding algorithms, to devise an efficient, scalable, and flexible means of parameter optimization that uses known RNA structures and thermodynamic experiments. Our method yields a significantly improved parameter set that outperforms existing baselines on all metrics, including an increase in the average predicted probability of ground-truth sequence-structure pairs for a single RNA family by over 23 orders of magnitude. Our framework provides a path towards drastically improved RNA models, enabling the flexible incorporation of new experimental data, definition of novel loss terms, large training sets, and even treatment as a module in larger deep learning pipelines. We make available a new database, RNAometer, with experimentally-determined stabilities for small RNA model systems.
最近邻模型是RNA二级结构形成的实际热力学模型,也是RNA结构预测和序列设计的基石。当前的函数形式(特纳2004年)包含约13000个潜在的热力学参数,将这些参数与实验数据和结构数据进行拟合在计算上具有挑战性。在这里,我们利用可微折叠的最新进展,这是一种直接计算RNA折叠算法梯度的方法,来设计一种高效、可扩展且灵活的参数优化方法,该方法使用已知的RNA结构和热力学实验。我们的方法产生了一个显著改进的参数集,在所有指标上都优于现有的基线,包括单个RNA家族的真实序列-结构对的平均预测概率提高了超过23个数量级。我们的框架为大幅改进RNA模型提供了一条途径,能够灵活纳入新的实验数据、定义新的损失项、使用大型训练集,甚至作为更大的深度学习管道中的一个模块进行处理。我们提供了一个新的数据库RNAometer,其中包含小RNA模型系统的实验测定稳定性。