Katoh Kazutaka, Toh Hiroyuki
Digital Medicine Initiative, Kyushu University, Fukuoka 812-8582, Japan.
Bioinformatics. 2007 Feb 1;23(3):372-4. doi: 10.1093/bioinformatics/btl592. Epub 2006 Nov 21.
To construct a multiple sequence alignment (MSA) of a large number (> approximately 10,000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process.
To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align approximately 60,000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam.
The present algorithm has been implemented in the MAFFT sequence alignment package (http://align.bmr.kyushu-u.ac.jp/mafft/software/).
Supplementary information is available at Bioinformatics online.
为了构建大量(>约10,000)序列的多序列比对(MSA),计算复杂度为O(N2)到O(N3)(其中N是序列数量)的引导树是最耗时的过程。
为克服这一限制,我们开发了一种近似算法PartTree,用于构建平均时间复杂度为O(N log N)的引导树。采用PartTree算法的新MSA方法在标准台式计算机上几分钟内就能比对约60,000个序列。在使用Pfam进行的基准测试中,这种近似导致的MSA准确性损失估计为百分之几。
当前算法已在MAFFT序列比对软件包(http://align.bmr.kyushu-u.ac.jp/mafft/software/)中实现。
补充信息可在《生物信息学》在线获取。