Corpet F, Michot B
Institut National de la Recherche Agronomique (INRA), Laboratoire de Génétique Cellulaire, Castanet Tolosan, France.
Comput Appl Biosci. 1994 Jul;10(4):389-99. doi: 10.1093/bioinformatics/10.4.389.
We have developed an algorithm and a computer program for aligning new RNA sequences with a bank of aligned homologous RNA sequences. Given a common folding structure for the bank, the program performs an alignment between the bank and a new sequence, optimal both in terms of primary and secondary structure. This method is useful to align sequences that present a common folding structure despite extensive divergence of their primary structures. It allows these preserved regions to be precisely distinguished from domains with more variable secondary structure. An optimal alignment of a sequence of length N with a bank of homologous sequences of length M is produced in O (M2N3) time and O(M2N2) space. For sequences that are too long for an algorithm of this complexity, a proposed strategy is to use a classical alignment (using only primary structure data) then improve it with the new algorithm in the regions where the bank stems are not aligned with possible stems in the new sequence. The algorithm has been implemented in Turbo Pascal on a PC, and has been used to align RNA sequences of eubacterial large ribosomal subunit.
我们开发了一种算法和一个计算机程序,用于将新的RNA序列与一组已比对的同源RNA序列进行比对。给定该序列库的一个共同折叠结构,该程序会在序列库和新序列之间进行比对,这种比对在一级结构和二级结构方面都是最优的。这种方法对于比对那些尽管一级结构存在广泛差异但具有共同折叠结构的序列很有用。它能使这些保留区域与具有更多可变二级结构的结构域精确区分开来。长度为N的序列与长度为M的同源序列库的最优比对在O (M2N3)时间和O(M2N2)空间内完成。对于长度过长以至于无法用这种复杂度的算法处理的序列,一种建议的策略是先使用经典比对(仅使用一级结构数据),然后在序列库的茎与新序列中可能的茎未比对上的区域用新算法对其进行改进。该算法已在个人计算机上用Turbo Pascal实现,并已用于比对真细菌大亚基核糖体的RNA序列。