National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Bioinformatics. 2011 Dec 15;27(24):3356-63. doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.
Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied.
A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales.
Supplementary data are available at Bioinformatics online.
通常使用定义为将氨基酸相互对齐的替换得分之和以及将一个序列中的氨基酸连续序列对齐到插入另一个序列的空字符的对齐得分来评估成对蛋白质序列比对。蛋白质图谱可以从蛋白质序列的多重比对中抽象出来,并且替换和间隙得分已经被推广到这种图谱与单个序列或其他图谱的对齐。尽管对于序列比对的轮廓替换得分应采取的一般形式存在广泛的共识,但对于如何最佳地构建轮廓轮廓替换得分却很少达成共识,并且已经提出了大量这种评分系统。在这里,我们评估了各种这样的替换得分。对于这种评估,给定一组多个标准对齐,我们计算了当与相关列而不是不相关列对齐时,轮廓列产生更高替换得分的概率。我们还将此度量推广到两个或三个相邻列的集合。这种简单方法的优点是,它主要不依赖于具有最薄弱经验支持的金标准对齐列,并且不需要为每个研究的替换得分拟合间隙和偏移成本。
通常,简单的均值轮廓序列得分的对称化表现最好。紧随其后的是使用各种合理方法构建的几种特定评分系统。
补充数据可在Bioinformatics 在线获得。