SSALN：一种使用从结构比对的蛋白质对中学习到的依赖于结构的替换矩阵和空位罚分的比对算法。

SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs.

作者信息

Qiu Jian, Elber Ron

机构信息

Department of Computer Science, Cornell University, Ithaca, New York 14853, USA.

出版信息

Proteins. 2006 Mar 1;62(4):881-91. doi: 10.1002/prot.20854.

DOI:10.1002/prot.20854

PMID:16385554

Abstract

In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino acid sequence information, secondary structure and solvent accessibility information of a position are used to derive substitution scores and position-specific gap penalties. In a test set of CASP5 targets, SSALN outperforms sequence alignment methods such as a Smith-Waterman algorithm with BLOSUM50 and PSI_BLAST. SSALN also generates better alignments than PSI_BLAST in the CASP6 test set. LOOPP server prediction based on an SSALN alignment is ranked the best for target T0280_1 in CASP6. SSALN is also compared with several threading methods and sequence alignment methods on the ProSup benchmark. SSALN has the highest alignment accuracy among the methods compared. On the Fischer's benchmark, SSALN performs better than CLUSTALW and GenTHREADER, and generates more alignments with accuracy >50%, >60% or >70% than FUGUE, but fewer alignments with accuracy >80% than FUGUE. All the supplemental materials can be found at http://www.cs.cornell.edu/ approximately jianq/research.htm.

摘要

在基于模板的蛋白质结构建模中，生成目标序列与模板序列之间的比对是关键步骤，会显著影响最终模型的准确性。本文提出了一种比对算法SSALN，该算法从结构比对的蛋白质对数据库中学习替换矩阵和位置特异性空位罚分。除氨基酸序列信息外，还利用一个位置的二级结构和溶剂可及性信息来推导替换分数和位置特异性空位罚分。在CASP5目标测试集中，SSALN优于诸如带BLOSUM50的Smith-Waterman算法和PSI_BLAST等序列比对方法。在CASP6测试集中，SSALN生成的比对也比PSI_BLAST更好。基于SSALN比对的LOOPP服务器预测在CASP6中对目标T0280_1的排名最佳。在ProSup基准测试中，还将SSALN与几种穿线法和序列比对方法进行了比较。在比较的方法中，SSALN具有最高的比对准确性。在Fischer基准测试中，SSALN的表现优于CLUSTALW和GenTHREADER，与FUGUE相比，生成的准确性>50%、>60%或>70%的比对更多，但准确性>80%的比对比FUGUE少。所有补充材料可在http://www.cs.cornell.edu/ approximately jianq/research.htm找到。