Edgar Robert C, Sjölander Kimmen
Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.
In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence-sequence methods (e.g. BLAST) and profile-sequence methods (e.g. PSI-BLAST). Profile-profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile-profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile-profile scoring functions by comparing alignments of 488 pairs of sequences with identity < or =30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence-sequence and sequence-profile methods, including BLAST and PSI-BLAST.
We find that profile-profile alignment gives an average improvement over our test set of typically 2-3% over profile-sequence alignment and approximately 40% over sequence-sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments.
Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/
近年来,已经提出了几种用于比对两个蛋白质序列谱的方法,据报道,与序列-序列方法(例如BLAST)和谱-序列方法(例如PSI-BLAST)相比,在比对准确性和同源物区分方面有了改进。谱-谱比对也是诸如CLUSTALW等渐进式多序列比对算法中的迭代步骤。然而,对于不同谱-谱评分函数的相对性能了解甚少。在这项工作中,我们通过将488对同一性≤30%的序列比对与结构比对进行比较,评估了23种不同谱-谱评分函数的比对准确性。我们在相同的训练集上为所有评分函数优化参数,并使用来自PSI-BLAST和SAM-T99的比对谱。结构比对是根据FSSP数据库和CE结构比对器之间的共识构建的。我们将结果与序列-序列和序列-谱方法(包括BLAST和PSI-BLAST)进行比较。
我们发现,谱-谱比对在我们的测试集上比谱-序列比对平均提高了2%-3%,比序列-序列比对提高了约40%。在所测试的大多数评分函数的相对性能方面,未观察到统计学上的显著差异。使用从SAM-T99比对构建的谱比从PSI-BLAST比对构建的谱获得了明显更好的结果。
源代码、参考比对和更详细的结果可在http://phylogenomics.berkeley.edu/profilealignment/免费获取。