Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.
Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.
Bioinformatics. 2017 Dec 15;33(24):3902-3908. doi: 10.1093/bioinformatics/btx391.
Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions.
Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences.
A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo.
Supplementary data are available at Bioinformatics online.
能够提供可靠蛋白质比对的方法对于许多生物信息学应用至关重要。在过去的几年中,已经开发出了许多不同的算法,并利用了从序列保守性到二级结构等各种信息来提高比对性能。对于具有高度差异序列的蛋白质,这一点尤为重要。然而,最近的研究表明,不同的特征在不同的蛋白质类别中可能具有不同的重要性,因此拥有更具可定制性的方法,能够处理不同的比对定义,将是一个优势。
我们在这里提出了 Rigapollo,一种基于成对 HMM-SVM 的高度灵活的成对比对方法,它可以使用任何类型的信息来构建比对。Rigapollo 允许用户决定最优的特征来比对他们感兴趣的蛋白质类别。在对齐高度差异序列时,它在两个著名的基准数据集上优于当前最先进的方法。
该算法的 Python 实现可在 http://ibsquare.be/rigapollo 获得。
补充数据可在 Bioinformatics 在线获得。