Cai Liming, Malmberg Russell L, Wu Yunzhou
Department of Computer Science, The University of Georgia, Athens, Georgia 30602, USA.
Bioinformatics. 2003;19 Suppl 1:i66-73. doi: 10.1093/bioinformatics/btg1007.
Modeling RNA pseudoknotted structures remains challenging. Methods have previously been developed to model RNA stem-loops successfully using stochastic context-free grammars (SCFG) adapted from computational linguistics; however, the additional complexity of pseudoknots has made modeling them more difficult. Formally a context-sensitive grammar is required, which would impose a large increase in complexity.
We introduce a new grammar modeling approach for RNA pseudoknotted structures based on parallel communicating grammar systems (PCGS). Our new approach can specify pseudoknotted structures, while avoiding context-sensitive rules, using a single CFG synchronized with a number of regular grammars. Technically, the stochastic version of the grammar model can be as simple as an SCFG. As with SCFG, the new approach permits automatic generation of a single-RNA structure prediction algorithm for each specified pseudoknotted structure model. This approach also makes it possible to develop full probabilistic models of pseudoknotted structures to allow the prediction of consensus structures by comparative analysis and structural homology recognition in database searches.
对RNA假结结构进行建模仍然具有挑战性。此前已经开发出一些方法,通过改编自计算语言学的随机上下文无关文法(SCFG)成功地对RNA茎环结构进行建模;然而,假结的额外复杂性使得对其进行建模更加困难。形式上需要一个上下文敏感文法,这将导致复杂性大幅增加。
我们基于并行通信文法系统(PCGS)引入了一种用于RNA假结结构的新文法建模方法。我们的新方法可以指定假结结构,同时避免使用上下文敏感规则,通过将单个上下文无关文法与多个正则文法同步来实现。从技术上讲,文法模型的随机版本可以像SCFG一样简单。与SCFG一样,新方法允许为每个指定的假结结构模型自动生成单个RNA结构预测算法。这种方法还使得开发假结结构的完整概率模型成为可能,从而通过比较分析和数据库搜索中的结构同源性识别来预测共有结构。