Zhang C T, Chou K C
Upjohn Laboratories, Kalamazoo, Michigan 49001.
Protein Sci. 1992 Mar;1(3):401-8. doi: 10.1002/pro.5560010312.
Proteins are generally classified into four structural classes: all-alpha proteins, all-beta proteins, alpha + beta proteins, and alpha/beta proteins. In this article, a protein is expressed as a vector of 20-dimensional space, in which its 20 components are defined by the composition of its 20 amino acids. Based on this, a new method, the so-called maximum component coefficient method, is proposed for predicting the structural class of a protein according to its amino acid composition. In comparison with the existing methods, the new method yields a higher general accuracy of prediction. Especially for the all-alpha proteins, the rate of correct prediction obtained by the new method is much higher than that by any of the existing methods. For instance, for the 19 all-alpha proteins investigated previously by P.Y. Chou, the rate of correct prediction by means of his method was 84.2%, but the correct rate when predicted with the new method would be 100%! Furthermore, the new method is characterized by an explicable physical picture. This is reflected by the process in which the vector representing a protein to be predicted is decomposed into four component vectors, each of which corresponds to one of the norms of the four protein structural classes.
全α蛋白、全β蛋白、α + β蛋白和α/β蛋白。在本文中,蛋白质被表示为一个20维空间的向量,其中它的20个分量由其20种氨基酸的组成来定义。基于此,提出了一种新的方法,即所谓的最大分量系数法,用于根据蛋白质的氨基酸组成预测其结构类型。与现有方法相比,新方法具有更高的总体预测准确率。特别是对于全α蛋白,新方法获得的正确预测率远高于任何现有方法。例如,对于之前由周培源研究的19种全α蛋白,用他的方法正确预测率为84.2%,但用新方法预测时正确率将达到100%!此外,新方法具有可解释的物理图像。这体现在将代表待预测蛋白质的向量分解为四个分量向量的过程中,每个分量向量对应四种蛋白质结构类型之一的规范。