Andronescu Mirela, Condon Anne, Hoos Holger H, Mathews David H, Murphy Kevin P
Department of Computer Science, University of British Columbia, Vancouver BC V6T 1Z4, Canada.
Bioinformatics. 2007 Jul 1;23(13):i19-28. doi: 10.1093/bioinformatics/btm223.
Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data.
In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods.
Our CG implementation is available at http://www.rnasoft.ca/CG/.
从碱基序列准确预测RNA二级结构是一个尚未解决的计算难题。通过自由能最小化进行预测的准确性受到基础自由能模型中能量参数质量的限制。使用最广泛的模型Turner99模型有数百个参数,因此一个稳健的参数估计方案应能有效处理包含数千个结构的大数据集。此外,除了结构数据外,估计方案还应使用可用的实验自由能数据进行训练。
在这项工作中,我们提出了约束生成(CG)方法,这是第一种用于RNA自由能参数估计的计算方法,它可以在大量结构数据和热力学数据上进行有效训练。我们的CG方法采用了一种新颖的迭代方案,即首先将能量值计算为约束优化问题的解。然后,新计算的能量参数用于更新优化函数的约束,以便在下一次迭代中更好地优化能量参数。在合理的生物学数据上使用我们的方法,我们获得了Turner99能量模型的修订参数。我们表明,通过使用我们的新参数,与当前的最先进方法相比,预测准确性有了显著提高。
我们的CG实现可在http://www.rnasoft.ca/CG/获取。