蛋白质序列谱比对评分函数的比较

A comparison of scoring functions for protein sequence profile alignment.

作者信息

Edgar Robert C, Sjölander Kimmen

出版信息

Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.

DOI:10.1093/bioinformatics/bth090

PMID:14962936

Abstract

MOTIVATION

In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST.

RESULTS

We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments.

AVAILABILITY

Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/

摘要

动机

近年来，已经提出了几种用于比对两个蛋白质序列谱的方法，据报道，与序列-序列方法（例如BLAST）和谱-序列方法（例如PSI-BLAST）相比，在比对准确性和同源物区分方面有了改进。谱-谱比对也是诸如CLUSTALW等渐进式多序列比对算法中的迭代步骤。然而，对于不同谱-谱评分函数的相对性能了解甚少。在这项工作中，我们通过将488对同一性≤30%的序列比对与结构比对进行比较，评估了23种不同谱-谱评分函数的比对准确性。我们在相同的训练集上为所有评分函数优化参数，并使用来自PSI-BLAST和SAM-T99的比对谱。结构比对是根据FSSP数据库和CE结构比对器之间的共识构建的。我们将结果与序列-序列和序列-谱方法（包括BLAST和PSI-BLAST）进行比较。