Wang Yi, Li Kuo-Bin
Bioinformatics Institute, Singapore 138671, Singapore.
J Bioinform Comput Biol. 2005 Apr;3(2):243-55. doi: 10.1142/s021972000500103x.
We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.
我们描述了一种用于提高多序列比对准确性的穷举贪心算法。采用一种简单的渐进比对方法来提供初始比对。然后针对一个目标函数对初始比对进行迭代优化。对于任何有效的比对,优化涉及三种操作:间隙的插入、删除和重排。该优化是穷举的,因为算法将上述操作应用于比对的所有合格位置。它也是贪心的,因为只接受能给出最佳目标得分改进的操作。这些算法已使用Java编程语言在EGMA(穷举贪心多序列比对)软件包中实现,并使用BAliBASE基准比对数据库进行了评估。尽管EGMA不能保证产生全局最优比对,但测试表明,与其他常用的迭代和非迭代比对程序相比,EGMA能够始终如一地构建高质量的比对。它对于优化通过其他方法获得的多序列比对也很有用。