West M W, Hecht M H
Department of Chemistry, Princeton University, Princeton, New Jersey 08544-1009, USA.
Protein Sci. 1995 Oct;4(10):2032-9. doi: 10.1002/pro.5560041008.
Protein sequences can be represented as binary patterns of polar ([symbol: see text]) and nonpolar ([symbol: see text]) amino acids. These binary sequence patterns are categorized into two classes: Class A patterns match the structural repeat of an idealized amphiphilic alpha-helix (3.6 residues per turn), and class B patterns match the structural repeat of an idealized amphiphilic beta-strand (2 residues per turn). The difference between these two classes of sequence patterns has led to a strategy for de novo protein design based on binary patterning of polar and nonpolar amino acids. Here we ask whether similar binary patterning is incorporated in the sequences and structures of natural proteins. Analysis of the Protein Data Bank demonstrates the following. (1) Class A sequence patterns occur considerably more frequently in the sequences of natural proteins that would be expected at random, but class B patterns occur less often than expected. (2) Each pattern is found predominantly in the secondary structure expected from the binary strategy for protein design. Thus, class A patterns are found more frequently in alpha-helices than in beta-strands, and class B patterns are found more frequently in beta-strands than in alpha-helices. (3) Among the alpha-helices of natural proteins, the most commonly used binary patterns are indeed the class A patterns. (4) Among all beta-strands in the database, the most commonly used binary patterns are not the expected class B patterns. (5) However, for solvent-exposed beta-strands, the correlation is striking: All beta-strands in the database that contain the class B patterns are exposed to solvent.(ABSTRACT TRUNCATED AT 250 WORDS)
蛋白质序列可以表示为极性([符号:见原文])和非极性([符号:见原文])氨基酸的二进制模式。这些二进制序列模式分为两类:A类模式与理想化两亲性α-螺旋(每圈3.6个残基)的结构重复相匹配,B类模式与理想化两亲性β-链(每圈2个残基)的结构重复相匹配。这两类序列模式之间的差异导致了一种基于极性和非极性氨基酸二进制模式的从头蛋白质设计策略。在这里,我们探讨天然蛋白质的序列和结构中是否也包含类似的二进制模式。对蛋白质数据库的分析表明如下情况。(1)A类序列模式在天然蛋白质序列中出现的频率明显高于随机预期,但B类模式出现的频率低于预期。(2)每种模式主要出现在蛋白质设计二进制策略所预期的二级结构中。因此,A类模式在α-螺旋中比在β-链中出现得更频繁,B类模式在β-链中比在α-螺旋中出现得更频繁。(3)在天然蛋白质的α-螺旋中,最常用的二进制模式确实是A类模式。(4)在数据库中所有的β-链中,最常用的二进制模式并非预期的B类模式。(5)然而,对于暴露于溶剂的β-链,相关性很显著:数据库中所有包含B类模式的β-链都暴露于溶剂中。(摘要截短于250字)