Han K, Kim H J
Department of Computer Science, Rutgers University, Piscataway, NJ 08855.
Nucleic Acids Res. 1993 Mar 11;21(5):1251-7. doi: 10.1093/nar/21.5.1251.
We have developed an algorithm and a computer program for simultaneously folding homologous RNA sequences. Given an alignment of M homologous sequences of length N, the program performs phylogenetic comparative analysis and predicts a common secondary structure conserved in the sequences. When the structure is not uniquely determined, it infers multiple structures which appear most plausible. This method is superior to energy minimization methods in the sense that it is not sensitive to point mutation of a sequence. It is also superior to usual phylogenetic comparative methods in that it does not require manual scrutiny for covariation or secondary structures. The most plausible 1-5 structures are produced in O(MN2 + N3) time and O(N2) space, which are the same requirements as those of widely used dynamic programs based on energy minimization for folding a single sequence. This is the first algorithm probably practical both in terms of time and space for finding secondary structures of homologous RNA sequences. The algorithm has been implemented in C on a Sun SparcStation, and has been verified by testing on tRNAs, 5S rRNAs, 16S rRNAs, TAR RNAs of human immunodeficiency virus type 1 (HIV-1), and RRE RNAs of HIV-1. We have also applied the program to cis-acting packaging sequences of HIV-1, for which no generally accepted structures yet exist, and propose potentially stable structures. Simulation of the program with random sequences with the same base composition and the same degree of similarity as the above sequences shows that structures common to homologous sequences are very unlikely to occur by chance in random sequences.
我们开发了一种用于同时折叠同源RNA序列的算法和计算机程序。给定长度为N的M条同源序列的比对,该程序进行系统发育比较分析,并预测序列中保守的共同二级结构。当结构不能唯一确定时,它会推断出多个最合理的结构。该方法优于能量最小化方法,因为它对序列的点突变不敏感。它也优于常规的系统发育比较方法,因为它不需要人工检查共变或二级结构。最合理的1 - 5种结构在O(MN2 + N3)时间和O(N2)空间内生成,这与基于能量最小化折叠单条序列的广泛使用的动态程序的要求相同。这可能是第一种在时间和空间方面都切实可行的用于寻找同源RNA序列二级结构的算法。该算法已用C语言在Sun SparcStation上实现,并通过对tRNA、5S rRNA、16S rRNA、人类免疫缺陷病毒1型(HIV - 1)的TAR RNA以及HIV - 1的RRE RNA进行测试得到验证。我们还将该程序应用于HIV - 1的顺式作用包装序列,目前尚无普遍接受的结构,我们提出了可能稳定的结构。用与上述序列具有相同碱基组成和相同相似程度的随机序列对该程序进行模拟表明,同源序列共有的结构在随机序列中偶然出现的可能性非常小。