Miyazawa S
Faculty of Technology, Gunma University, Japan.
Protein Eng. 1995 Oct;8(10):999-1009. doi: 10.1093/protein/8.10.999.
Probabilities of all possible correspondences of residues in aligning two proteins are evaluated by assuming that the statistical weight of each alignment is proportional to the exponent of its total similarity score. Based on such probabilities, a probability alignment that includes the most probable correspondences is proposed. In the case of highly similar sequence pairs, the probability alignments agree with the maximum similarity alignments that correspond to the alignments with the maximum similarity score. Significant correspondences in the probability alignments are those whose probabilities are > 0.5. The probability alignment method is applied to a few protein pairs, and results indicate that such highly probable correspondences in the probability alignments are probably correct correspondences that agree with the structural alignments and that incorrect correspondences in the maximum similarity alignments are usually insignificant correspondences in the probability alignments. The root mean square deviations in superimposition of corresponding residues tend to be smaller for significant correspondences in the probability alignments than for all correspondences in the maximum similarity alignments, indicating that incorrect correspondences in the maximum similarity alignments tend to be insignificant correspondences in probability alignments. This fact is also confirmed in 109 protein pairs that are similar to each other with sequence identities between 90 and 35%. In addition, the probability alignment method may better predict correct correspondences than the maximum similarity alignment method. Probability alignments do, of course, depend on a scoring scheme but are less sensitive to the value of parameters such as gap penalties. The present probability alignment method is useful for constructing reliable alignments based on the probabilities of correspondences and can be used with any scoring scheme.
通过假设每种比对的统计权重与其总相似性得分的指数成正比,来评估比对两个蛋白质时残基所有可能对应关系的概率。基于这些概率,提出了一种包含最可能对应关系的概率比对。对于高度相似的序列对,概率比对与对应于具有最大相似性得分的比对的最大相似性比对一致。概率比对中显著的对应关系是那些概率大于0.5的对应关系。将概率比对方法应用于少数蛋白质对,结果表明,概率比对中这种高度可能的对应关系可能是与结构比对一致的正确对应关系,而最大相似性比对中的错误对应关系通常在概率比对中是不显著的对应关系。概率比对中显著对应关系的相应残基叠加时的均方根偏差往往比最大相似性比对中所有对应关系的均方根偏差小,这表明最大相似性比对中的错误对应关系在概率比对中往往是不显著的对应关系。这一事实在109个序列同一性在90%至35%之间且彼此相似的蛋白质对中也得到了证实。此外,概率比对方法可能比最大相似性比对方法能更好地预测正确的对应关系。当然,概率比对取决于一种评分方案,但对诸如空位罚分等参数的值不太敏感。当前的概率比对方法对于基于对应关系的概率构建可靠的比对很有用,并且可以与任何评分方案一起使用。