Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
RNA. 2010 Dec;16(12):2304-18. doi: 10.1261/rna.1950510. Epub 2010 Oct 12.
Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.
给定当前细胞中 RNA 分子多种功能理解的快速进展,高效且准确预测 RNA 结构的方法变得越来越有价值。为了提高二级结构预测的准确性,我们开发并完善了能量参数估计的优化技术。我们构建在两种先前的 RNA 自由能参数估计方法的基础上:(1)约束生成(CG)方法,它迭代地生成约束条件,以确保相同分子的已知结构的能量低于其他结构;(2)玻尔兹曼似然(BL)方法,它推断出一组 RNA 自由能参数,使一组参考 RNA 结构的条件似然最大化。在这里,我们主要通过两种方式扩展这些方法:(1)CG 的最大间距扩展;(2)一种新的线性高斯贝叶斯网络,用于对特征关系进行建模,通过在参数之间共享统计强度,有效地利用稀疏数据。当我们在具有参考结构的 2518 个 RNA 分子的综合数据集上进行测量时,我们的方法在 RNA 最小自由能假结无二级结构预测的准确性方面取得了显著的提高。我们的参数可以与预测 RNA 二级结构、RNA 杂交或结构集合的软件一起使用。我们的数据、软件、结果和各种格式的参数集可在 http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params 上免费获得。