Gardner Paul P, Wilm Andreas, Washietl Stefan
Department of Evolutionary Biology, University of Copenhagen Universitetsparken 15, 2100 Copenhagen Ø, Denmark.
Nucleic Acids Res. 2005 Apr 28;33(8):2433-9. doi: 10.1093/nar/gki541. Print 2005.
To date, few attempts have been made to benchmark the alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; instead, rather ad hoc models are generally favoured. Here, we systematically test the performance of existing alignment algorithms on structural RNAs. This work was aimed at achieving the following goals: (i) to determine conditions where it is appropriate to apply common sequence alignment methods to the structural RNA alignment problem. This indicates where and when researchers should consider augmenting the alignment process with auxiliary information, such as secondary structure and (ii) to determine which sequence alignment algorithms perform well under the broadest range of conditions. We find that sequence alignment alone, using the current algorithms, is generally inappropriate <50-60% sequence identity. Second, we note that the probabilistic method ProAlign and the aging Clustal algorithms generally outperform other sequence-based algorithms, under the broadest range of applications.
迄今为止,很少有人尝试对核酸序列的比对算法进行基准测试。通常,复杂的如PAM或BLOSUM类模型用于比对蛋白质,但核酸却没有类似的模型;相反,一般更青睐特别的模型。在此,我们系统地测试了现有比对算法在结构RNA上的性能。这项工作旨在实现以下目标:(i)确定将常见序列比对方法应用于结构RNA比对问题的合适条件。这表明研究人员在何处以及何时应考虑用辅助信息(如二级结构)增强比对过程;(ii)确定在最广泛条件下哪些序列比对算法表现良好。我们发现,仅使用当前算法进行序列比对,在序列同一性<50 - 60%时通常不合适。其次,我们注意到概率方法ProAlign和老牌的Clustal算法在最广泛的应用范围内通常优于其他基于序列的算法。