Mathews David H, Turner Douglas H
Department of Chemistry, University of Rochester, NY 14627-0216, USA.
J Mol Biol. 2002 Mar 22;317(2):191-203. doi: 10.1006/jmbi.2001.5351.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence. The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.
随着基因组序列数据库规模的迅速增长,RNA的计算分析在揭示结构-功能关系和潜在药物靶点方面将变得越来越重要。对于一个包含已知二级结构的大型数据库,单个序列的RNA二级结构预测平均准确率为73%。这种准确度水平为通过比较序列分析或实验研究的解释来确定二级结构提供了一个良好的起点。Dynalign是一种新的计算机算法,它通过结合自由能最小化和比较序列分析来提高结构预测的准确性,以找到两个序列共有的低自由能结构,而无需任何序列同一性。它使用了Sankoff提出的动态规划结构。然而,Dynalign限制了两个序列中对齐核苷酸之间允许的最大距离M。这使得计算易于处理,因为复杂度简化为O(M(3)N(3)),其中N是较短序列的长度。使用13个tRNA、7个5S rRNA和2个R2 3' UTR序列集对Dynalign的准确性进行了测试。平均而言,Dynalign预测tRNA中已知碱基对的比例为86.1%,而仅自由能最小化的预测比例为59.7%。对于5S rRNA,平均准确率从47.8%提高到86.4%。标准自由能最小化对高桥果蝇R2 3' UTR的二级结构预测效果不佳。然而,使用Dynalign,与黑腹果蝇序列一起预测的结构几乎与通过比较序列分析确定的结构相匹配。