Saigo Hiroto, Vert Jean-Philippe, Akutsu Tatsuya
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, 611-0011, Japan.
BMC Bioinformatics. 2006 May 5;7:246. doi: 10.1186/1471-2105-7-246.
Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different.
Contrary to the local alignment score computed by the Smith-Waterman algorithm, the local alignment kernel is differentiable with respect to the amino acid substitution and its derivative can be computed efficiently by dynamic programming. We optimized the substitution matrix by classical gradient descent by setting an objective function that measures how well the local alignment kernel discriminates homologs from non-homologs in the COG database. The local alignment kernel exhibits better performance when it uses the matrices and gap parameters optimized by this procedure than when it uses the matrices optimized for the Smith-Waterman algorithm. Furthermore, the matrices and gap parameters optimized for the local alignment kernel can also be used successfully by the Smith-Waterman algorithm.
This optimization procedure leads to useful substitution matrices, both for the local alignment kernel and the Smith-Waterman algorithm. The best performance for homology detection is obtained by the local alignment kernel.
通过直接比较蛋白质序列来检测远距离同源性仍然是一项具有挑战性的任务。我们之前开发了一种序列间的相似性评分方法,称为局部比对核,它与支持向量机结合使用时在这项任务中表现良好。局部比对核依赖于氨基酸替换矩阵。由于常用的用于氨基酸匹配评分的BLOSUM或PAM矩阵已被优化用于与史密斯-沃特曼算法结合使用,因此对于局部比对核而言最优的矩阵可能会有所不同。
与史密斯-沃特曼算法计算的局部比对得分不同,局部比对核在氨基酸替换方面是可微的,并且其导数可以通过动态规划有效地计算出来。我们通过经典梯度下降法优化替换矩阵,设定了一个目标函数,该函数衡量局部比对核在COG数据库中区分同源物和非同源物的能力。与使用为史密斯-沃特曼算法优化的矩阵相比,当局部比对核使用通过此过程优化的矩阵和空位参数时,表现出更好的性能。此外,为局部比对核优化的矩阵和空位参数也可以成功地被史密斯-沃特曼算法使用。
这种优化过程产生了对局部比对核和史密斯-沃特曼算法都有用的替换矩阵。通过局部比对核可获得同源性检测的最佳性能。