Tabei Yasuo, Tsuda Koji, Kin Taishin, Asai Kiyoshi
Department of Computational Biology, Graduate School of Frontier Science, University of Tokyo, CB04 Kiban-tou 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
Bioinformatics. 2006 Jul 15;22(14):1723-9. doi: 10.1093/bioinformatics/btl177. Epub 2006 May 11.
The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required.
We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms.
The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.
非编码RNA的功能与其二级结构密切相关,但已知单个序列的二级结构预测并不可靠。因此,在不知道新的非编码RNA确切二级结构的情况下,我们必须收集具有共同二级结构的相似RNA序列来进行分析。因此,在搜索相似RNA时进行序列比较不仅应考虑它们的序列相似性,还应考虑它们潜在的二级结构。 Sankoff算法可预测序列的共同二级结构,但应用于大规模分析时计算成本过高。由于我们经常需要比较大量的cDNA序列或在全基因组序列中搜索相似的RNA,因此需要更快的算法。
我们提出了一种基于茎候选物固定长度片段的结构比对来比较RNA序列的新方法。所实现的软件SCARNA(RNA的茎候选物比对器)足够快,可应用于大规模分析中的长序列。比对的准确性与现有慢得多的算法相当或更好。
带有图形化结构比对查看器的SCARNA网络服务器可在http://www.scarna.org/上获得。