Bastolla Ugo, Porto Markus, Roman H Eduardo, Vendruscolo Michele
Centro de Astrobiología (INTA-CSIC), 28850 Torrejón de Ardoz, Spain.
Proteins. 2005 Jan 1;58(1):22-30. doi: 10.1002/prot.20240.
With the aim of studying the relationship between protein sequences and their native structures, we adopted vectorial representations for both sequence and structure. The structural representation was based on the principal eigenvector of the fold's contact matrix (PE). As has been recently shown, the latter encodes sufficient information for reconstructing the whole contact matrix. The sequence was represented through a hydrophobicity profile (HP), using a generalized hydrophobicity scale that we obtained from the principal eigenvector of a residue-residue interaction matrix, and denoted as interactivity scale. Using this novel scale, we defined the optimal HP of a protein fold, and, by means of stability arguments, predicted to be strongly correlated with the PE of the fold's contact matrix. This prediction was confirmed through an evolutionary analysis, which showed that the PE correlates with the HP of each individual sequence adopting the same fold and, even more strongly, with the average HP of this set of sequences. Thus, protein sequences evolve in such a way that their average HP is close to the optimal one, implying that neutral evolution can be viewed as a kind of motion in sequence space around the optimal HP. Our results indicate that the correlation coefficient between N-dimensional vectors constitutes a natural metric in the vectorial space in which we represent both protein sequences and protein structures, which we call vectorial protein space. In this way, we define a unified framework for sequence-to-sequence, sequence-to-structure and structure-to-structure alignments. We show that the interactivity scale is nearly optimal both for the comparison of sequences to sequences and sequences to structures.
为了研究蛋白质序列与其天然结构之间的关系,我们对序列和结构都采用了向量表示法。结构表示基于折叠接触矩阵的主特征向量(PE)。最近的研究表明,后者编码了重建整个接触矩阵所需的足够信息。序列通过疏水轮廓(HP)表示,使用我们从残基-残基相互作用矩阵的主特征向量获得的广义疏水标度,并将其表示为相互作用标度。使用这个新的标度,我们定义了蛋白质折叠的最佳HP,并通过稳定性论证预测其与折叠接触矩阵的PE密切相关。通过进化分析证实了这一预测,该分析表明PE与采用相同折叠的每个单独序列的HP相关,甚至与这组序列的平均HP更密切相关。因此,蛋白质序列的进化方式使得它们的平均HP接近最佳值,这意味着中性进化可以被视为围绕最佳HP在序列空间中的一种运动。我们的结果表明,N维向量之间的相关系数构成了我们表示蛋白质序列和蛋白质结构的向量空间中的一种自然度量,我们将其称为向量蛋白质空间。通过这种方式,我们定义了一个用于序列到序列、序列到结构和结构到结构比对的统一框架。我们表明,相互作用标度对于序列与序列以及序列与结构的比较几乎都是最佳的。