Department of Computer Engineering, Firat University, 23119 Elazig, Turkey.
Department of Computer Science, University of Calgary, Calgary, AB, Canada.
Comput Methods Programs Biomed. 2014 Apr;114(1):38-49. doi: 10.1016/j.cmpb.2014.01.013. Epub 2014 Jan 31.
Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate and statistically significant multiple alignments is still a challenge. In this paper, we propose an efficient method by using multi-objective genetic algorithm (MSAGMOGA) to discover optimal alignments with affine gap in multiple sequence data. The main advantage of our approach is that a large number of tradeoff (i.e., non-dominated) alignments can be obtained by a single run with respect to conflicting objectives: affine gap penalty minimization and similarity and support maximization. To the best of our knowledge, this is the first effort with three objectives in this direction. The proposed method can be applied to any data set with a sequential character. Furthermore, it allows any choice of similarity measures for finding alignments. By analyzing the obtained optimal alignments, the decision maker can understand the tradeoff between the objectives. We compared our method with the three well-known multiple sequence alignment methods, MUSCLE, SAGA and MSA-GA. As the first of them is a progressive method, and the other two are based on evolutionary algorithms. Experiments on the BAliBASE 2.0 database were conducted and the results confirm that MSAGMOGA obtains the results with better accuracy statistical significance compared with the three well-known methods in aligning multiple sequence alignment with affine gap. The proposed method also finds solutions faster than the other evolutionary approaches mentioned above.
多序列比对在生物信息学和计算生物学中具有核心重要性。尽管已经设计了大量用于计算多序列比对的算法,但高效计算高度准确和具有统计学意义的多序列比对仍然是一个挑战。在本文中,我们提出了一种有效的方法,通过使用多目标遗传算法(MSAGMOGA)在多序列数据中发现具有仿射间隙的最佳比对。我们方法的主要优势在于,对于冲突目标(即仿射间隙惩罚最小化、相似性和支持最大化),单次运行可以获得大量的权衡(即非支配)比对。据我们所知,这是朝着这一方向的首次三目标努力。所提出的方法可以应用于具有顺序特征的任何数据集。此外,它允许为寻找比对选择任何相似性度量。通过分析获得的最佳比对,决策者可以了解目标之间的权衡。我们将我们的方法与三种著名的多序列比对方法(MUSCLE、SAGA 和 MSA-GA)进行了比较。由于前一种方法是渐进方法,而后两种方法是基于进化算法的。我们在 BAliBASE 2.0 数据库上进行了实验,结果证实,与三种著名的方法相比,MSAGMOGA 在对齐具有仿射间隙的多序列比对方面具有更好的准确性和统计学意义。与上述三种进化方法相比,所提出的方法还能更快地找到解决方案。