Sato Kengo, Morita Kensuke, Sakakibara Yasubumi
Japan Biological Informatics Consortium, 2-45 Aomi, Koto-ku, Tokyo 135-8073, Japan.
J Math Biol. 2008 Jan;56(1-2):201-14. doi: 10.1007/s00285-007-0108-4. Epub 2007 Jul 7.
Identifying non-coding RNA regions on the genome using computational methods is currently receiving a lot of attention. In general, it is essentially more difficult than the problem of detecting protein-coding genes because non-coding RNA regions have only weak statistical signals. On the other hand, most functional RNA families have conserved sequences and secondary structures which are characteristic of their molecular function in a cell. These are known as sequence motifs and consensus structures, respectively. In this paper, we propose an improved method which extends a pairwise structural alignment method for RNA sequences to handle position specific scoring matrices and hence to incorporate motifs into structural alignment of RNA sequences. To model sequence motifs, we employ position specific scoring matrices (PSSMs). Experimental results show that PSSMs enable us to find individual RNA families efficiently, especially if we have biological knowledge such as sequence motifs.
目前,使用计算方法识别基因组上的非编码RNA区域受到了广泛关注。一般来说,这比检测蛋白质编码基因的问题本质上更困难,因为非编码RNA区域只有微弱的统计信号。另一方面,大多数功能性RNA家族具有保守序列和二级结构,这些分别是它们在细胞中的分子功能的特征。这些分别被称为序列基序和共有结构。在本文中,我们提出了一种改进方法,该方法扩展了用于RNA序列的成对结构比对方法,以处理位置特异性评分矩阵,从而将基序纳入RNA序列的结构比对中。为了对序列基序进行建模,我们使用位置特异性评分矩阵(PSSM)。实验结果表明,PSSM使我们能够有效地找到单个RNA家族,特别是当我们拥有诸如序列基序等生物学知识时。