Hickson R E, Simon C, Perrey S W
Department of Genetics, University of Hawaii at Manoa, USA.
Mol Biol Evol. 2000 Apr;17(4):530-9. doi: 10.1093/oxfordjournals.molbev.a026333.
The performances of five global multiple-sequence alignment programs (CLUSTAL W, Divide and Conquer, Malign, PileUp, and TreeAlign) were evaluated using part of the animal mitochondrial small subunit (12S) rRNA molecule. Conserved sequence motifs derived from an alignment based on secondary structural information were used to score how well each program aligned a data set of five vertebrate and five invertebrate taxa over a range of parameter values. All of the programs could align the motifs with reasonable accuracy for at least one set of parameter conditions, although if the whole sequence was considered, similarity to the structural alignment was only 25%-34%. Use of small gap costs generally gave more accurate results, although Malign and TreeAlign generated longer alignments when gap costs were low. The programs differed in the consistency of the alignments when gap cost was varied; CLUSTAL W, Divide and Conquer, and TreeAlign were the most accurate and robust, while PileUp performed poorly as gap cost values increased, and the accuracy of Malign fluctuated. Default settings for the programs did not give the best results, and attempting to select similar parameter values in different programs did not always result in more similar alignments. Poor alignment of even well-conserved motifs can occur if these are near sites with insertions or deletions. Since there is no a priori way to determine gap costs and because such costs can vary over the gene, alignment of rRNA sequences, particularly the less well conserved regions, should be treated carefully and aided by secondary structure and conserved motifs. Some motifs are single bases and so are often invisible to alignment programs. Our tests involved the most conserved regions of the 12S rRNA gene, and alignment of less well conserved regions will be more problematical. None of the alignments we examined produced a fully resolved phylogeny for the data set, indicating that this portion of 12S rRNA is insufficient for resolution of distant evolutionary relationships.
使用动物线粒体小亚基(12S)rRNA分子的一部分对五个全球多序列比对程序(CLUSTAL W、分治算法、Malign、PileUp和TreeAlign)的性能进行了评估。基于二级结构信息比对得到的保守序列基序用于评估每个程序在一系列参数值下对五个脊椎动物和五个无脊椎动物分类群数据集的比对效果。所有程序在至少一组参数条件下都能以合理的准确性比对基序,不过如果考虑整个序列,与结构比对的相似度仅为25%-34%。一般来说,使用较小的空位罚分通常能得到更准确的结果,尽管当空位罚分较低时,Malign和TreeAlign生成的比对结果更长。当空位罚分变化时,各程序在比对的一致性方面存在差异;CLUSTAL W、分治算法和TreeAlign最准确、最稳健,而随着空位罚分增加,PileUp表现不佳,Malign的准确性波动较大。程序的默认设置并未给出最佳结果,在不同程序中尝试选择相似的参数值也并非总能得到更相似的比对结果。如果保守基序靠近存在插入或缺失的位点,即使是保守性很好的基序也可能比对不佳。由于没有先验方法来确定空位罚分,并且此类罚分在基因上可能会有所不同,因此rRNA序列的比对,尤其是保守性较差的区域,应谨慎处理,并借助二级结构和保守基序。有些基序是单碱基,因此比对程序往往无法识别。我们的测试涉及12S rRNA基因最保守的区域,而保守性较差区域的比对会更具问题。我们检查的所有比对都未为该数据集生成完全解析的系统发育树,这表明12S rRNA的这一部分不足以解析远缘进化关系。