Nishikawa K, Kubota Y, Ooi T
J Biochem. 1983 Sep;94(3):997-1007. doi: 10.1093/oxfordjournals.jbchem.a134443.
Correlations of the amino acid composition of a protein to its location in an organism, biological function, folding type, and disulfide bond(s) were examined for 356 proteins. In the present data set, 325 proteins of known location and biological characters were divided into 122 intracellular enzymes (BI), 73 intracellular non-enzymes (BII), 45 extracellular enzymes (BIII), and 85 extracellular nonenzymes (BIV). The composition of these proteins were expressed as points in the composition space of 18 orthogonal axes, each representing the content of an amino acid. The distributions of points of BI and BIII were narrow and approximately spherical but those of BII and BIV were distributed rather widely. The groups are separated from each other in the space. We divided the space into four regions (A1 to A4) corresponding to the groups BI to BIV. A protein could be assigned to one of the four groups (A1 to A4) from its amino acid composition: The proteins correctly assigned amounted to 177 out of 195 intracellular proteins, and 94 out of 130 extracellular proteins. The correspondence was about 80% for classification into intracellular and extracellular proteins and 66% for that into the four groups. The folding type also had a significant correlation to the above groups, i.e., intracellular enzymes are rich in alpha/beta, nonenzymes alpha, extracellular enzymes beta and alpha + beta, and nonenzymes beta. The differences in average composition between intra- and extracellular proteins, and between enzymes and nonenzymes were related to the structural characters, i.e., intracellular proteins contain more amino acids favoring alpha-helix than extracellular ones, and enzymes contain more hydrophobic amino acids than nonenzymes. The statistics on 213 Cys-containing proteins showed that disulfide bond(s) are found mostly (90%) in the extracellular proteins. The results indicate that amino acid composition is well correlated to location in an organism, biological function, folding type, and disulfide bonding. The implications of the new findings are discussed from the protein-taxonomical point of view, and the validity of the present method is assessed.
我们对356种蛋白质的氨基酸组成与它们在生物体中的位置、生物学功能、折叠类型以及二硫键之间的相关性进行了研究。在当前的数据集中,325种已知位置和生物学特征的蛋白质被分为122种细胞内酶(BI)、73种细胞内非酶(BII)、45种细胞外酶(BIII)和85种细胞外非酶(BIV)。这些蛋白质的组成在18个正交轴的组成空间中表示为点,每个轴代表一种氨基酸的含量。BI和BIII的点分布狭窄且近似球形,但BII和BIV的点分布较为广泛。这些组在空间中相互分离。我们将空间分为对应于BI至BIV组的四个区域(A1至A4)。一种蛋白质可以根据其氨基酸组成被分配到四个组(A1至A4)中的一个:195种细胞内蛋白质中有177种被正确分配,130种细胞外蛋白质中有94种被正确分配。对于细胞内和细胞外蛋白质的分类,对应率约为80%,对于分为四组的分类,对应率为66%。折叠类型也与上述组有显著相关性,即细胞内酶富含α/β、非酶富含α、细胞外酶富含β和α + β、非酶富含β。细胞内和细胞外蛋白质之间以及酶和非酶之间平均组成的差异与结构特征有关,即细胞内蛋白质比细胞外蛋白质含有更多有利于α螺旋的氨基酸,酶比非酶含有更多疏水氨基酸。对213种含半胱氨酸蛋白质的统计表明,二硫键大多(90%)存在于细胞外蛋白质中。结果表明,氨基酸组成与生物体中的位置、生物学功能、折叠类型和二硫键形成密切相关。我们从蛋白质分类学的角度讨论了这些新发现的意义,并评估了本方法的有效性。