Wood T C, Pearson W R
Department of Biochemistry, University of Virginia, Charlottesville, VA, 22908, USA.
J Mol Biol. 1999 Aug 27;291(4):977-95. doi: 10.1006/jmbi.1999.2972.
The relationship between sequence similarity and structural similarity has been examined in 36 protein families with five or more diverse members whose structures are known. The structural similarity within a family (as determined with the DALI structure comparison program) is linearly related to sequence similarity (as determined by a Smith-Waterman search of the protein sequences in the structure database). The correlation between structural similarity and sequence similarity is very high; 18 of the 36 families had linear correlation coefficients r>/=0.878, and only nine had correlation coefficients r</=0.815. Inclusion of higher-order terms in the structure/sequence relationship improved the fit by less than 7% in 27 of the 36 families. Differences in sequence/structure correlations are distributed evenly among the four protein structural classes, alpha, beta, alpha/beta, and alpha+beta. While most protein families show high correlations between sequence similarity and structural similarity, the amount of structural change per sequence change, i.e. the structural mutation sensitivity, varies almost fourfold. Protein families with high and low structural mutation sensitivity are distributed evenly among protein structure classes. In addition, we did not detect strong correlations between structural mutation sensitivity and either protein family mutation rates or protein size. Our results are more consistent with models of protein structure that encode a protein family's fold throughout the protein sequence, and not just in a few critical residues.
在36个拥有五个或更多不同成员且其结构已知的蛋白质家族中,研究了序列相似性与结构相似性之间的关系。一个家族内的结构相似性(由DALI结构比较程序确定)与序列相似性(通过对结构数据库中的蛋白质序列进行史密斯-沃特曼搜索确定)呈线性相关。结构相似性与序列相似性之间的相关性非常高;36个家族中有18个的线性相关系数r≥0.878,只有9个的相关系数r≤0.815。在36个家族中的27个家族中,在结构/序列关系中纳入高阶项后,拟合度提高不到7%。序列/结构相关性的差异在α、β、α/β和α+β这四种蛋白质结构类别中均匀分布。虽然大多数蛋白质家族在序列相似性和结构相似性之间显示出高度相关性,但每序列变化的结构变化量,即结构突变敏感性,几乎相差四倍。具有高和低结构突变敏感性的蛋白质家族在蛋白质结构类别中均匀分布。此外,我们没有检测到结构突变敏感性与蛋白质家族突变率或蛋白质大小之间的强相关性。我们的结果与在整个蛋白质序列而非仅在少数关键残基中编码蛋白质家族折叠的蛋白质结构模型更为一致。