Matsui Hiroshi, Sato Kengo, Sakakibara Yasubumi
Department of Biosciences and Informatics, Keio University, Kohoku-ku, Yokohama, Japan.
Proc IEEE Comput Syst Bioinform Conf. 2004:290-9.
Since the whole genome sequences for many species are currently available, computational predictions of RNA secondary structures and computational identifications of those non-coding RNA regions by comparative genomics become important, and require more advanced alignment methods. Recently, an approach of structural alignments for RNA sequences has been introduced to solve these problems. By structural alignments, we mean a pairwise alignment to align an unfolded RNA sequence into a folded RNA sequence of known secondary structure. Pair HMMs on tree structures (PHMMTSs) proposed by Sakakibara are efficient automata-theoretic models for structural alignments of RNA secondary structures, but are incapable of handling pseudoknots. On the other hand, tree adjoining grammars (TAGs) is a subclass of context-sensitive grammar, which is suitable for modeling pseudoknots. Our goal is to extend PHMMTSs by incorporating TAGs to be able to handle pseudoknots.
We propose the pair stochastic tree adjoining grammars (PSTAGs) for modeling RNA secondary structures including pseudoknots and show the strong experimental evidences that modeling pseudoknot structures significantly improves the prediction accuracies of RNA secondary structures. First, we extend the notion of PHMMTSs defined on alignments of 'trees' to PSTAGs defined on alignments of "TAG (derivation) trees", which represent a top-down parsing process of TAGs and are functionally equivalent to derived trees of TAGs. Second, we modify PSTAGs so that it takes as input a pair of a linear sequence and a TAG tree representing a pseudoknot structure of RNA to produce a structural alignment. Then, we develop a polynomial-time algorithm for obtaining an optimal structural alignment by PSTAGs, based on dynamic programming parser. We have done several computational experiments for predicting pseudoknots by PSTAGs, and our computational experiments suggests that prediction of RNA pseudoknot structures by our method are more efficient and biologically plausible than by other conventional methods. The binary code for PSTAG method is freely available from our website at http://www.dna.bio.keio.ac.jp/pstag/.
由于目前许多物种的全基因组序列均可获取,通过比较基因组学对RNA二级结构进行计算预测以及对那些非编码RNA区域进行计算识别变得愈发重要,这就需要更先进的比对方法。最近,一种用于RNA序列的结构比对方法被引入以解决这些问题。通过结构比对,我们指的是将一个未折叠的RNA序列与一个已知二级结构的折叠RNA序列进行比对的双序列比对。由坂木原提出的树结构上的配对隐马尔可夫模型(PHMMTS)是用于RNA二级结构结构比对的高效自动机理论模型,但无法处理假结。另一方面,树邻接文法(TAG)是上下文敏感文法的一个子类,适用于对假结进行建模。我们的目标是通过合并TAG来扩展PHMMTS,使其能够处理假结。
我们提出了用于对包括假结的RNA二级结构进行建模的配对随机树邻接文法(PSTAG),并展示了强有力的实验证据,即对假结结构进行建模能显著提高RNA二级结构的预测准确性。首先,我们将在“树”比对上定义的PHMMTS的概念扩展到在“TAG(推导)树”比对上定义的PSTAG,“TAG(推导)树”表示TAG的自顶向下解析过程,并且在功能上等同于TAG的派生树。其次,我们对PSTAG进行修改,使其以一个线性序列和一个表示RNA假结结构的TAG树作为输入,以生成一个结构比对。然后,我们基于动态规划解析器开发了一种多项式时间算法,用于通过PSTAG获得最优结构比对。我们已经进行了几项通过PSTAG预测假结的计算实验,我们的计算实验表明,我们的方法对RNA假结结构的预测比其他传统方法更高效且在生物学上更合理。PSTAG方法的二进制代码可从我们的网站http://www.dna.bio.keio.ac.jp/pstag/免费获取。