Knudsen B, Andersen E S, Damgaard C, Kjems J, Gorodkin J
Bioinformatics Research Center, Høegh Guldbergsgade 10, University of Aarhus, DK-8000 Arhus C, Denmark.
Comput Biol Chem. 2004 Jul;28(3):219-26. doi: 10.1016/j.compbiolchem.2004.04.001.
Predicting RNA secondary structure using evolutionary history can be carried out by using an alignment of related RNA sequences with conserved structure. Accurately determining evolutionary substitution rates for base pairs and single stranded nucleotides is a concern for methods based on this type of approach. Determining these rates can be hard to do reliably without a large and accurate initial alignment, which ideally also has structural annotation. Hence, one must often apply rates extracted from other RNA families with trusted alignments and structures. Here, we investigate this problem by applying rates derived from tRNA and rRNA to the prediction of the much more rapidly evolving 5'-region of HIV-1. We find that the HIV-1 prediction is in agreement with experimental data, even though the relative evolutionary rate between A and G is significantly increased, both in stem and loop regions. In addition we obtained an alignment of the 5' HIV-1 region that is more consistent with the structure than that currently in the database. We added randomized noise to the original values of the rates to investigate the stability of predictions to rate matrix deviations. We find that changes within a fairly large range still produce reliable predictions and conclude that using rates from a limited set of RNA sequences is valid over a broader range of sequences.
利用进化历史预测RNA二级结构可以通过对具有保守结构的相关RNA序列进行比对来实现。对于基于此类方法的技术而言,准确确定碱基对和单链核苷酸的进化替代率是一个需要关注的问题。如果没有一个庞大且准确的初始比对(理想情况下还需有结构注释),可靠地确定这些速率可能会很困难。因此,人们常常应用从具有可信比对和结构的其他RNA家族中提取的速率。在此,我们通过将源自tRNA和rRNA的速率应用于进化速度快得多的HIV-1 5'区域的预测来研究这个问题。我们发现,尽管在茎区和环区A与G之间的相对进化速率显著增加,但HIV-1的预测结果与实验数据相符。此外,我们获得了一个比数据库中当前版本更符合结构的HIV-1 5'区域的比对。我们在速率的原始值中添加了随机噪声,以研究预测对速率矩阵偏差的稳定性。我们发现,在相当大的范围内变化仍能产生可靠的预测结果,并得出结论:在更广泛的序列范围内,使用来自有限RNA序列集的速率是有效的。