Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran.
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran.
Bioinformatics. 2020 Jun 1;36(12):3662-3668. doi: 10.1093/bioinformatics/btaa175.
Multiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool.
The results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets.
The source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa.
Supplementary data are available at Bioinformatics online.
多序列比对(MSA)是计算生物学中的一个重要且具有挑战性的问题。大多数现有的方法只能在可接受的时间内提供短长度的多重比对。然而,当研究人员在多重比对中面对基因组大小时,该过程需要巨大的处理空间/时间。因此,使用能够快速、准确地进行基因组大小比对的方法具有重要意义,尤其是在对非常长的比对进行分析时。在这里,我们提出了一种有效的方法,称为 FAME,它从具有公共区域的位置垂直划分序列;然后按连续顺序排列。然后,这些公共区域被移动并放置在彼此之下,它们之间的子序列使用任何现有的 MSA 工具进行对齐。
结果表明,FAME 与 MSA 方法的组合以及使用 minimizer 能够在个人计算机上执行,并与独立的 MSA 工具相比,以更高的总和对(SP)分数精细对齐长序列。当我们选择具有更长长度的基因组数据集时,组合方法的 SP 分数逐渐提高。方法的计算复杂度计算以这样的方式支持结果,即 FAME 与 MSA 工具的组合导致在数据集上的执行速度至少快四倍。
源代码以及所有数据集和运行参数都可在 http://github.com/naznoosh/msa 上免费获取。
补充数据可在生物信息学在线获得。