BMC Bioinformatics. 2012 May 4;13:78. doi: 10.1186/1471-2105-13-78.
Stochastic Context-Free Grammars (SCFGs) were applied successfully to RNA secondary structure prediction in the early 90s, and used in combination with comparative methods in the late 90s. The set of SCFGs potentially useful for RNA secondary structure prediction is very large, but a few intuitively designed grammars have remained dominant. In this paper we investigate two automatic search techniques for effective grammars - exhaustive search for very compact grammars and an evolutionary algorithm to find larger grammars. We also examine whether grammar ambiguity is as problematic to structure prediction as has been previously suggested.
These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, though none are dramatically better than classic grammars. In general, results showed that many grammars with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars.
Overall the method of evolving SCFGs for RNA secondary structure prediction proved effective in finding many grammars that had strong predictive accuracy, as good or slightly better than those designed manually. Furthermore, several of the best grammars found were ambiguous, demonstrating that such grammars should not be disregarded.
随机上下文无关语法 (SCFG) 在 90 年代初期成功应用于 RNA 二级结构预测,并在 90 年代后期与比较方法结合使用。对于 RNA 二级结构预测,潜在有用的 SCFG 集非常大,但少数直观设计的语法仍然占主导地位。在本文中,我们研究了两种用于有效语法的自动搜索技术 - 用于非常紧凑语法的穷举搜索和用于寻找更大语法的进化算法。我们还检查了语法歧义是否像以前所建议的那样对结构预测构成问题。
这些搜索技术应用于在最大数据集上预测 RNA 二级结构,揭示了新的和有趣的语法,但没有一种语法明显优于经典语法。一般来说,结果表明,许多结构非常不同的语法可能具有非常相似的预测能力。发现了许多有歧义的语法,它们至少与当前最好的无歧义语法一样有效。
总的来说,用于 RNA 二级结构预测的 SCFG 进化方法在寻找具有强预测准确性的许多语法方面非常有效,这些语法与手动设计的语法一样好或略好。此外,发现的一些最佳语法是有歧义的,这表明不应忽视此类语法。