Katoh Kazutaka, Kuma Kei-ichi, Toh Hiroyuki, Miyata Takashi
Bioinformatics Center, Institute for Chemical Research, Kyoto University Uji, Kyoto 611-0011, Japan.
Nucleic Acids Res. 2005 Jan 20;33(2):511-8. doi: 10.1093/nar/gki198. Print 2005.
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of approximately 8 sequences with low similarity, the accuracy was improved (2-10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10(-5)-10(-20)) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.
多序列比对程序MAFFT的准确性已得到提高。MAFFT的新版本(5.3)提供了新的迭代优化选项,即H-INS-i、F-INS-i和G-INS-i,这些选项将两两比对信息纳入目标函数。在由50多个序列的比对组成的基准测试中,MAFFT的这些新选项显示出比当前可用方法(包括TCoffee版本2和CLUSTAL W)更高的准确性。与之前可用的选项一样,MAFFT的新选项可以在标准台式计算机上处理数百个序列。我们还研究了比对中包含的同源序列数量的影响。对于由大约8个低相似性序列组成的多序列比对,当将这些序列与从数据库中收集的数十个其紧密同源序列(E值<10^(-5)-10^(-20))一起比对时,准确性得到了提高(2-10个百分点)。大多数方法通常都观察到了这种提高,但这里提出的MAFFT新选项的提高尤为显著。因此,我们编写了一个Ruby脚本mafftE.rb,它使用NCBI-BLAST将输入序列与其从SwissProt收集的紧密同源序列一起进行比对。