Génomique Analytique, Université Pierre et Marie Curie, INSERM U511, 91, Bd de l'Hôpital, 75013 Paris, France.
Evol Bioinform Online. 2008 Oct 9;4:255-61. doi: 10.4137/ebo.s885.
The adequacy of substitution matrices to model evolutionary relationships between amino acid sequences can be numerically evaluated by checking the mathematical property of triangle inequality for all triplets of residues. By converting substitution scores into distances, one can verify that a direct path between two amino acids is shorter than a path passing through a third amino acid in the amino acid space modeled by the matrix. If the triangle inequality is not verified, the intuition is that the evolutionary signal is not well modeled by the matrix, that the space is locally inconsistent and that the matrix construction was probably based on insufficient biological data. Previous analysis on several substitution matrices revealed that the number of triplets violating the triangle inequality increases with sequence divergence. Here, we compare matrices which are dedicated to the alignment of highly divergent proteins. The triangle inequality is tested on several classical substitution matrices as well as in a pair of "complementary" substitution matrices recording the evolutionary pressures inside and outside hydrophobic blocks in protein sequences. The analysis proves the crucial role of hydrophobic residues in substitution matrices dedicated to the alignment of distantly related proteins.
替代矩阵对氨基酸序列间进化关系的建模充分性,可以通过检查所有残基三联体的三角形不等式的数学性质来数值评估。通过将替换分数转换为距离,可以验证在矩阵所建模的氨基酸空间中,两个氨基酸之间的直接路径比经过第三个氨基酸的路径更短。如果不满足三角形不等式,则直觉上是矩阵未很好地建模进化信号,空间局部不一致,并且矩阵构建可能基于不足的生物学数据。对几个替代矩阵的先前分析表明,违反三角形不等式的三联体数量随序列分歧而增加。在这里,我们比较了专门用于高度分歧蛋白质比对的矩阵。在几种经典替代矩阵以及一对“互补”替代矩阵上测试了三角形不等式,这些矩阵记录了蛋白质序列中疏水块内外的进化压力。分析证明了疏水残基在专门用于远距离相关蛋白质比对的替代矩阵中的关键作用。