Barrett Christopher, He Qijun, Huang Fenix W, Reidys Christian M
1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia.
2 Department of Computer Science, University of Virginia, Charlottesville, Virginia.
J Comput Biol. 2019 Mar;26(3):173-192. doi: 10.1089/cmb.2018.0095. Epub 2019 Jan 17.
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. This pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered originally for designing more efficient inverse folding algorithms and subsequently enhanced by facilitating the sampling of sequences. We present here a partition function of sequence/structure pairs, with endowed Hamming distance and base pair distance filtration. This partition function is an augmentation of the previous mentioned (dual) partition function. We develop an efficient dynamic programming routine to recursively compute the partition function with this double filtration. Our framework is capable of dealing with RNA secondary structures as well as 1-structures, where a 1-structure is an RNA pseudoknot structure consisting of "building blocks" of genus 0 or 1. In particular, 0-structures, consisting of only "building blocks" of genus 0, are exactly RNA secondary structures. The time complexity for calculating the partition function of 1-pairs, that is, sequence/structure pairs where the structures are 1-structures, is O(hbn), where h, b, n denote the Hamming distance, base pair distance, and sequence length, respectively. The time complexity for the partition function of 0-pairs is O(hbn).
最近,一个将RNA序列及其RNA二级结构视为配对的框架,引发了一些关于如何推断RNA序列中编码语义的信息论观点。这种配对自然地源于RNA二级结构的能量模型。在配对中固定序列会产生RNA能量景观,其配分函数由麦卡斯基尔发现。对偶地,固定结构会诱导序列的能量景观。后者最初是为设计更高效的反向折叠算法而考虑的,随后通过促进序列采样得到了增强。我们在此给出一个序列/结构对的配分函数,并赋予汉明距离和碱基对距离过滤。这个配分函数是前述(对偶)配分函数的扩展。我们开发了一个高效的动态规划例程,用于通过这种双重过滤递归地计算配分函数。我们的框架能够处理RNA二级结构以及1 - 结构,其中1 - 结构是一种由属0或1的“构建块”组成的RNA假结结构。特别地,仅由属0的“构建块”组成的0 - 结构恰好就是RNA二级结构。计算1 - 对(即结构为1 - 结构的序列/结构对)的配分函数的时间复杂度为O(hbn),其中h、b、n分别表示汉明距离、碱基对距离和序列长度。0 - 对的配分函数的时间复杂度为O(hbn)。