Thompson M J, Goldstein R A
Biophysics Research Division, University of Michigan, Ann Arbor 48109-1055, USA.
Proteins. 1996 May;25(1):28-37. doi: 10.1002/(SICI)1097-0134(199605)25:1<28::AID-PROT3>3.0.CO;2-G.
Using an information theoretic formalism, we optimize classes of amino acid substitution to be maximally indicative of local protein structure. Our statistically-derived classes are loosely identifiable with the heuristic constructions found in previously published work. However, while these other methods provide a more rigid idealization of physicochemically constrained residue substitution, our classes provide substantially more structural information with many fewer parameters. Moreover, these substitution classes are consistent with the paradigmatic view of the sequence-to-structure relationship in globular proteins which holds that the three-dimensional architecture is predominantly determined by the arrangement of hydrophobic and polar side chains with weak constraints on the actual amino acid identities. More specific constraints are imposed on the placement of prolines, glycines, and the charged residues. These substitution classes have been used in highly accurate predictions of residue solvent accessibility. They could also be used in the identification of homologous proteins, the construction and refinement of multiple sequence alignments, and as a means of condensing and codifying the information in multiple sequence alignments for secondary structure prediction and tertiary fold recognition.
我们使用信息论形式体系,优化氨基酸替换类别,使其能最大程度地指示局部蛋白质结构。我们通过统计得出的类别与先前发表的工作中所发现的启发式结构大致可识别。然而,虽然其他方法对物理化学约束的残基替换提供了更严格的理想化,但我们的类别用少得多的参数提供了实质上更多的结构信息。此外,这些替换类别与球状蛋白质中序列与结构关系的范式观点一致,该观点认为三维结构主要由疏水和极性侧链的排列决定,而对实际氨基酸身份的约束较弱。脯氨酸、甘氨酸和带电荷残基的位置受到更具体的约束。这些替换类别已用于残基溶剂可及性的高精度预测。它们还可用于同源蛋白质的鉴定、多序列比对的构建和优化,以及作为浓缩和编码多序列比对中的信息以进行二级结构预测和三级折叠识别的一种手段。