Department of Computer Science, University of Georgia, Athens, Georgia 30602, USA.
BMC Bioinformatics. 2012 Apr 12;13 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2105-13-S5-S1.
The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection.
This paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences.
These results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.
在基因组序列中计算识别 RNA 需要识别 RNA 序列的信号。香农碱基配对熵是检测结构非编码 RNA(ncRNA)时 RNA 二级结构折叠确定性的指标。在二级结构的玻尔兹曼系综中,根据其在所有替代平衡结构中的频率来估计碱基对的概率。然而,这种熵在区分 ncRNA 和随机序列方面尚未达到预期的性能。开发改进熵度量性能的新方法可能会导致基于结构检测的更有效的 ncRNA 基因发现。
本文表明,通过假设在折叠中仅发生在能量稳定茎中的规范碱基对,可以显著提高碱基配对熵的测量性能。这种约束实际上缩小了二级结构的空间,并且可能降低了不利于天然折叠的碱基对的概率。实际上,与随机序列相比,使用这种约束模型计算的碱基配对熵显示出 ncRNA 之间 Z 分数差距明显缩小,以及所有 13 个测试的 ncRNA 集的 Z 分数都有了大幅提高。
这些结果表明,通过研究 ncRNA 的二级结构系综,开发有效的基于结构的 ncRNA 基因发现方法是可行的。