Suppr超能文献

从结构中精确学习RNA能量参数。

Exact learning of RNA energy parameters from structure.

作者信息

Chitsaz Hamidreza, Aminisharifabad Mohammad

机构信息

1Department of Computer Science, Colorado State University, Fort Collins, Colorado.

2Department of Computer Science, Wayne State University, Detroit, Michigan.

出版信息

J Comput Biol. 2015 Jun;22(6):463-73. doi: 10.1089/cmb.2014.0164. Epub 2015 Mar 10.

Abstract

We consider the problem of exact learning of parameters of a linear RNA energy model from secondary structure data. A necessary and sufficient condition for learnability of parameters is derived, which is based on computing the convex hull of union of translated Newton polytopes of input sequences. The set of learned energy parameters is characterized as the convex cone generated by the normal vectors to those facets of the resulting polytope that are incident to the origin. In practice, the sufficient condition may not be satisfied by the entire training data set; hence, computing a maximal subset of training data for which the sufficient condition is satisfied is often desired. We show that the problem is NP-hard in general for an arbitrary dimensional feature space. Using a randomized greedy algorithm, we select a subset of RNA STRAND v2.0 database that satisfies the sufficient condition for separate A-U, C-G, G-U base pair counting model. The set of learned energy parameters includes experimentally measured energies of A-U, C-G, and G-U pairs; hence, our parameter set is in agreement with the Turner parameters.

摘要

我们考虑从二级结构数据中精确学习线性RNA能量模型参数的问题。推导了参数可学习性的充要条件,该条件基于计算输入序列平移牛顿多胞体并集的凸包。学习到的能量参数集被表征为由所得多胞体中与原点相交的那些面的法向量生成的凸锥。在实际中,整个训练数据集可能不满足充分条件;因此,通常希望计算满足充分条件的训练数据的最大子集。我们表明,对于任意维特征空间,该问题一般是NP难的。使用随机贪心算法,我们从RNA STRAND v2.0数据库中选择了一个满足单独A-U、C-G、G-U碱基对计数模型充分条件的子集。学习到的能量参数集包括A-U、C-G和G-U对的实验测量能量;因此,我们的参数集与特纳参数一致。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验