Toyota Technological Institute at Chicago, IL, USA.
Bioinformatics. 2011 Jul 1;27(13):i102-10. doi: 10.1093/bioinformatics/btr232.
Accurate tertiary structures are very important for the functional study of non-coding RNA molecules. However, predicting RNA tertiary structures is extremely challenging, because of a large conformation space to be explored and lack of an accurate scoring function differentiating the native structure from decoys. The fragment-based conformation sampling method (e.g. FARNA) bears shortcomings that the limited size of a fragment library makes it infeasible to represent all possible conformations well. A recent dynamic Bayesian network method, BARNACLE, overcomes the issue of fragment assembly. In addition, neither of these methods makes use of sequence information in sampling conformations. Here, we present a new probabilistic graphical model, conditional random fields (CRFs), to model RNA sequence-structure relationship, which enables us to accurately estimate the probability of an RNA conformation from sequence. Coupled with a novel tree-guided sampling scheme, our CRF model is then applied to RNA conformation sampling. Experimental results show that our CRF method can model RNA sequence-structure relationship well and sequence information is important for conformation sampling. Our method, named as TreeFolder, generates a much higher percentage of native-like decoys than FARNA and BARNACLE, although we use the same simple energy function as BARNACLE.
zywang@ttic.edu; j3xu@ttic.edu
Supplementary data are available at Bioinformatics online.
准确的三级结构对于非编码 RNA 分子的功能研究非常重要。然而,由于要探索的构象空间很大,并且缺乏区分天然结构和假构象的准确评分函数,因此预测 RNA 三级结构极具挑战性。基于片段的构象采样方法(例如 FARNA)存在一些缺点,即片段库的有限大小使得其无法很好地表示所有可能的构象。最近的一种动态贝叶斯网络方法 BARNACLE 克服了片段组装的问题。此外,这些方法都没有在采样构象时利用序列信息。在这里,我们提出了一种新的概率图形模型条件随机场 (CRF),用于建模 RNA 序列-结构关系,这使我们能够从序列准确估计 RNA 构象的概率。将我们的 CRF 模型与一种新颖的树引导采样方案相结合,然后将其应用于 RNA 构象采样。实验结果表明,我们的 CRF 方法可以很好地建模 RNA 序列-结构关系,并且序列信息对于构象采样很重要。我们的方法名为 TreeFolder,它生成的天然样假构象比 FARNA 和 BARNACLE 高得多,尽管我们使用了与 BARNACLE 相同的简单能量函数。
补充数据可在 Bioinformatics 在线获取。