Katoh Kazutaka, Misawa Kazuharu, Kuma Kei-ichi, Miyata Takashi
Department of Biophysics, Graduate School of Science, Kyoto University, Kyoto 606-8502, Japan.
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436.
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.
已经开发了一种多序列比对程序MAFFT。与现有方法相比,其CPU时间大幅减少。MAFFT包含两种新技术。(i)通过快速傅里叶变换(FFT)快速识别同源区域,其中氨基酸序列被转换为由每个氨基酸残基的体积和极性值组成的序列。(ii)我们提出了一种简化的评分系统,该系统即使对于具有大插入或延伸的序列以及长度相似的远缘相关序列,在减少CPU时间和提高比对准确性方面也表现良好。MAFFT实现了两种不同的启发式方法,渐进方法(FFT-NS-2)和迭代细化方法(FFT-NS-i)。通过计算机模拟和基准测试将FFT-NS-2和FFT-NS-i的性能与其他方法进行了比较;与CLUSTALW相比,FFT-NS-2的CPU时间大幅减少,且准确性相当。当输入序列数量超过60时,FFT-NS-i比T-COFFEE快100多倍,且不牺牲准确性。