Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA.
RNA. 2010 May;16(5):865-78. doi: 10.1261/rna.903510. Epub 2010 Apr 1.
The use of free energy-based algorithms to compute RNA secondary structures produces, in general, large numbers of foldings. Recent research has addressed the problem of grouping structures into a small number of clusters and computing a representative folding for each cluster. At the heart of this problem is the need to compute a quantity that measures the difference between pairs of foldings. We introduce a new concept, the relaxed base-pair (RBP) score, designed to give a more biologically realistic measure of the difference between structures than the base-pair (BP) metric, which simply counts the number of base pairs in one structure but not the other. The degree of relaxation is determined by a single relaxation parameter, t. When t = 0, (no relaxation) our method is the same as the BP metric. At the other extreme, a very large value of t will give a distance of 0 for identical structures and 1 for structures that differ. Scores can be recomputed with different values of t, at virtually no extra computation cost, to yield satisfactory results. Our results indicate that relaxed measures give more stable and more meaningful clusters than the BP metric. We also use the RBP score to compute representative foldings for each cluster.
基于自由能的算法被用于计算 RNA 二级结构,通常会产生大量的折叠结构。最近的研究已经解决了将结构分组到少数几个簇中,并为每个簇计算代表折叠的问题。这个问题的核心是需要计算一个衡量折叠结构之间差异的量。我们引入了一个新的概念,即松弛碱基对(RBP)得分,旨在比碱基对(BP)度量更准确地衡量结构之间的差异,BP 度量只是简单地计算一个结构中的碱基对数量,而不考虑另一个结构中的碱基对数量。松弛的程度由单个松弛参数 t 决定。当 t = 0(无松弛)时,我们的方法与 BP 度量相同。在另一个极端情况下,t 的一个非常大的值将为相同的结构赋予距离 0,而为不同的结构赋予距离 1。可以以几乎为零的额外计算成本,用不同的值 t 重新计算得分,以获得令人满意的结果。我们的结果表明,松弛度量比 BP 度量产生更稳定和更有意义的簇。我们还使用 RBP 得分计算每个簇的代表折叠。