Department of Computer Architecture and Computer Technology, University of Granada, 18071 Granada, Spain.
Nucleic Acids Res. 2013 Jan 7;41(1):e26. doi: 10.1093/nar/gks919. Epub 2012 Oct 11.
Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased.
多序列比对(MSA)已经成为生物信息学中研究最多的方法之一,可用于执行其他杰出任务,如结构预测、生物功能分析或下一代测序。然而,当前的 MSA 算法并不总是提供一致的解决方案,因为在处理低相似度序列时,比对变得越来越困难。众所周知,这些算法直接依赖于序列的特定特征,这对比对准确性产生了相关影响。最近设计了许多 MSA 工具,但无法事先知道哪一个最适合特定的序列集。在这项工作中,我们分析了文献中介绍的一些最常用的算法及其对几种特征的依赖性。然后,开发了一种基于最小二乘支持向量机的新型智能算法,根据其分析的特征来预测每个比对的准确性。该算法使用 2180 个 MSA 数据集进行。该系统首先估计可能的比对的准确性。然后选择最有前途的方法来对齐每一组序列。由于只运行一个选定的算法,因此不会大大增加计算时间。